| .if false |
| ================================================================================== |
| Copyright (c) 2019 Nokia |
| Copyright (c) 2018-2019 AT&T Intellectual Property. |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| ================================================================================== |
| .fi |
| |
| .if false |
| Mnemonic: failures.im |
| Abstract: This is the major section on how an application might handle failures |
| Date: 2 August 2019 |
| Author: E. Scott Daniels |
| .fi |
| |
| &h1(Handling Failures) |
| The vast majority of states reported by RMR are fatal; if encountered |
| during setup or initialization, then it is unlikely that any message |
| oriented processing should continue, and when encountered on a message |
| operation continued operation on that message should be abandoned. |
| Specifically with regard to message sending, it is very likely that |
| the underlying transport mechanism will report a &ital(soft,) or |
| transient, failure which might be successful if the operation is |
| retried at a later point in time. The paragraphs below discuss the |
| methods that an application might deal with these soft failures. |
| |
| &h2(Failure Notification) |
| When a soft failure is reported, the returned message buffer returned |
| by the RMR function will be &cw(RMR_ERR_RETRY.) These types of |
| failures can occur for various reasons; one of two reasons is |
| typically the underlying cause: |
| |
| &half_space |
| &indent |
| &beg_list( &lic1 ) |
| &li The session to the targeted recipient (endpoint) is not connected. |
| &half_space |
| |
| &li The transport mechanism buffer pool is full and cannot accept another buffer. |
| &half_space |
| &end_list |
| &uindent |
| &space |
| |
| Unfortunately, it is not possible for RMR to determine which of these |
| two cases is occurring, and equally as unfortunate the time to resolve |
| each is different. The first, no connection, may require up to a |
| second before a message can be accepted, while a rejection because of |
| buffer shortage is likely to resolve in less than a millisecond. |
| |
| &h2(Application Response) |
| The action which an application takes when a soft failure is reported |
| ultimately depends on the nature of the application with respect to |
| factors such as tolerance to extended message latency, dropped |
| messages, and over all message rate. |
| |
| &h2(RMR Retry Modes) |
| In an effort to reduce the workload of an application developer, RMR |
| has a default retry policy such that RMR will attempt to retransmit a |
| message up to 1000 times when a soft failure is reported. These |
| retries generally take less than 1 millisecond (if all 1000 are |
| attempted) and in most cases eliminates nearly all reported soft |
| failures to the application. When using this mode, it might allow the |
| application to simply treat all bad return values from a send attempt |
| as permanent failures. &space |
| |
| If an application is so sensitive to any delay in RMR, or the |
| underlying transport mechanism, it is possible to set RMR to return a |
| failure immediately on any kind of error (permanent failures are |
| always reported without retry). In this mode, RMR will still set the |
| state in the message buffer to &cw(RMR_ERR_RETRY,) but will &bold(not) |
| make any attempts to resend the message. This zero-retry policy is |
| enabled by invoking the &func(rmr_set_stimeout) with a value of 0; |
| this can be done once immediately after &func(rmr_init:) is invoked. |
| |
| &space |
| Regardless of the retry mode which the application sets, it will |
| ultimately be up to the application to handle failures by queuing the |
| message internally for resend, retrying immediately, or dropping the |
| send attempt all together. As stated before, only the application can |
| determine how to best handle send failures. |
| |
| |
| &h2(Other Failures) |
| RMR will return the state of processing for message based operations |
| (send/receive) as the status in the message buffer. For non-message |
| operations, state is returned to the caller as the integer return |
| value for all functions which are not expected to return a pointer |
| (e.g. &func(rmr_init:).) The following are the RMR state constants |
| and a brief description of their meaning. |
| |
| &space |
| .st 8p |
| &indent |
| &beg_dlist( 1.5i &ditext ) |
| &di(RMR_OK) state is good; operation finished successfully |
| &half_space |
| |
| &di(RMR_ERR_BADARG) argument passed to function was unusable |
| &half_space |
| |
| &di(RMR_ERR_NOENDPT) send/call could not find an endpoint based on msg type |
| &half_space |
| |
| &di(RMR_ERR_EMPTY) msg received had no payload; attempt to send an empty message |
| &half_space |
| |
| &di(RMR_ERR_NOHDR) message didn't contain a valid header |
| &half_space |
| |
| &di(RMR_ERR_SENDFAILED) send failed; errno may contain the transport provider reason |
| &half_space |
| |
| &di(RMR_ERR_CALLFAILED) unable to send the message for a call function; errno may contain the transport provider reason |
| &half_space |
| |
| &di(RMR_ERR_NOWHOPEN) no wormholes are open |
| &half_space |
| |
| &di(RMR_ERR_WHID) the wormhole id provided was invalid |
| &half_space |
| |
| &di(RMR_ERR_OVERFLOW) operation would have busted through a buffer/field size |
| &half_space |
| |
| &di(RMR_ERR_RETRY) request (send/call/rts) failed, but caller should retry (EAGAIN for wrappers) |
| &half_space |
| |
| &di(RMR_ERR_RCVFAILED) receive failed (hard error) |
| &half_space |
| |
| &di(RMR_ERR_TIMEOUT) response message not received in a reasonable amount of time |
| &half_space |
| |
| &di(RMR_ERR_UNSET) the message hasn't been populated with a transport buffer |
| &half_space |
| |
| &di(RMR_ERR_TRUNC) length in the received buffer is longer than the size of the allocated payload, |
| received message likely truncated (length set by sender could be wrong, but we can't know that) |
| &half_space |
| |
| &di(RMR_ERR_INITFAILED) initialisation of something (probably message) failed |
| &half_space |
| |
| &di(RMR_ERR_NOTSUPP) the request is not supported, or RMr was not initialised for the request |
| &end_dlist |
| &uindent |
| .st &textsize |
| &space |
| |
| Depending on the underlying transport mechanism, and the nature of the |
| call that RMR attempted, the system &cw(errno) value might reflect |
| additional detail about the failure. Applications should &bold(not) |
| rely on errno as some transport mechanisms do not set it with any |
| consistency. |