blob: 27949a9be48b58cc8e1e4eb221b244586deb7e11 [file] [log] [blame]
.if false
==================================================================================
Copyright (c) 2019 Nokia
Copyright (c) 2018-2019 AT&T Intellectual Property.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==================================================================================
.fi
.if false
Mnemonic: failures.im
Abstract: This is the major section on how an application might handle failures
Date: 2 August 2019
Author: E. Scott Daniels
.fi
&h1(Handling Failures)
The vast majority of states reported by RMR are fatal; if encountered
during setup or initialization, then it is unlikely that any message
oriented processing should continue, and when encountered on a message
operation continued operation on that message should be abandoned.
Specifically with regard to message sending, it is very likely that
the underlying transport mechanism will report a &ital(soft,) or
transient, failure which might be successful if the operation is
retried at a later point in time. The paragraphs below discuss the
methods that an application might deal with these soft failures.
&h2(Failure Notification)
When a soft failure is reported, the returned message buffer returned
by the RMR function will be &cw(RMR_ERR_RETRY.) These types of
failures can occur for various reasons; one of two reasons is
typically the underlying cause:
&half_space
&indent
&beg_list( &lic1 )
&li The session to the targeted recipient (endpoint) is not connected.
&half_space
&li The transport mechanism buffer pool is full and cannot accept another buffer.
&half_space
&end_list
&uindent
&space
Unfortunately, it is not possible for RMR to determine which of these
two cases is occurring, and equally as unfortunate the time to resolve
each is different. The first, no connection, may require up to a
second before a message can be accepted, while a rejection because of
buffer shortage is likely to resolve in less than a millisecond.
&h2(Application Response)
The action which an application takes when a soft failure is reported
ultimately depends on the nature of the application with respect to
factors such as tolerance to extended message latency, dropped
messages, and over all message rate.
&h2(RMR Retry Modes)
In an effort to reduce the workload of an application developer, RMR
has a default retry policy such that RMR will attempt to retransmit a
message up to 1000 times when a soft failure is reported. These
retries generally take less than 1 millisecond (if all 1000 are
attempted) and in most cases eliminates nearly all reported soft
failures to the application. When using this mode, it might allow the
application to simply treat all bad return values from a send attempt
as permanent failures. &space
If an application is so sensitive to any delay in RMR, or the
underlying transport mechanism, it is possible to set RMR to return a
failure immediately on any kind of error (permanent failures are
always reported without retry). In this mode, RMR will still set the
state in the message buffer to &cw(RMR_ERR_RETRY,) but will &bold(not)
make any attempts to resend the message. This zero-retry policy is
enabled by invoking the &func(rmr_set_stimeout) with a value of 0;
this can be done once immediately after &func(rmr_init:) is invoked.
&space
Regardless of the retry mode which the application sets, it will
ultimately be up to the application to handle failures by queuing the
message internally for resend, retrying immediately, or dropping the
send attempt all together. As stated before, only the application can
determine how to best handle send failures.
&h2(Other Failures)
RMR will return the state of processing for message based operations
(send/receive) as the status in the message buffer. For non-message
operations, state is returned to the caller as the integer return
value for all functions which are not expected to return a pointer
(e.g. &func(rmr_init:).) The following are the RMR state constants
and a brief description of their meaning.
&space
.st 8p
&indent
&beg_dlist( 1.5i &ditext )
&di(RMR_OK) state is good; operation finished successfully
&half_space
&di(RMR_ERR_BADARG) argument passed to function was unusable
&half_space
&di(RMR_ERR_NOENDPT) send/call could not find an endpoint based on msg type
&half_space
&di(RMR_ERR_EMPTY) msg received had no payload; attempt to send an empty message
&half_space
&di(RMR_ERR_NOHDR) message didn't contain a valid header
&half_space
&di(RMR_ERR_SENDFAILED) send failed; errno may contain the transport provider reason
&half_space
&di(RMR_ERR_CALLFAILED) unable to send the message for a call function; errno may contain the transport provider reason
&half_space
&di(RMR_ERR_NOWHOPEN) no wormholes are open
&half_space
&di(RMR_ERR_WHID) the wormhole id provided was invalid
&half_space
&di(RMR_ERR_OVERFLOW) operation would have busted through a buffer/field size
&half_space
&di(RMR_ERR_RETRY) request (send/call/rts) failed, but caller should retry (EAGAIN for wrappers)
&half_space
&di(RMR_ERR_RCVFAILED) receive failed (hard error)
&half_space
&di(RMR_ERR_TIMEOUT) response message not received in a reasonable amount of time
&half_space
&di(RMR_ERR_UNSET) the message hasn't been populated with a transport buffer
&half_space
&di(RMR_ERR_TRUNC) length in the received buffer is longer than the size of the allocated payload,
received message likely truncated (length set by sender could be wrong, but we can't know that)
&half_space
&di(RMR_ERR_INITFAILED) initialisation of something (probably message) failed
&half_space
&di(RMR_ERR_NOTSUPP) the request is not supported, or RMR was not initialised for the request
&end_dlist
&uindent
.st &textsize
&space
Depending on the underlying transport mechanism, and the nature of the
call that RMR attempted, the system &cw(errno) value might reflect
additional detail about the failure. Applications should &bold(not)
rely on errno as some transport mechanisms do not set it with any
consistency.