| .if false |
| ================================================================================== |
| Copyright (c) 2019 Nokia |
| Copyright (c) 2018-2019 AT&T Intellectual Property. |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| ================================================================================== |
| .fi |
| |
| .if false |
| Mnemonic: failures.im |
| Abstract: This is the major section on how an application might handle failures |
| Date: 2 August 2019 |
| Author: E. Scott Daniels |
| .fi |
| |
| &h1(Handling Failures) |
| The vast majority of states reported by RMR are fatal; if encountered during setup or initialisation, |
| then it is unlikely that any message oriented processing should continue, and when encountered on |
| a message operation continued operation on that message should be abandoned. |
| Specifically with regard to message sending, it is very likely that the underlying transport mechanism |
| will report a &ital(soft,) or transient, failure which might be successful if the operation is retried at a |
| later point in time. |
| The paragraphs below discuss the methods that an application might deal with these soft failures. |
| |
| &h2(Failure Notification) |
| When a soft failure is reported, the returned message buffer returned by the RMR function will be &cw(RMR_ERR_RETRY.) |
| These types of failures can occur for various reasons; one of two reasons is typically the underlying cause: |
| |
| &half_space |
| &indent |
| &beg_list( &lic1 ) |
| &li The session to the targeted recipient (endpoint) is not connected. |
| &half_space |
| |
| &li The transport mechanism buffer pool is full and cannot accept another buffer. |
| &half_space |
| &end_list |
| &uindent |
| &space |
| |
| Unfortunately, it is not possible for RMR to determine which of these two cases is occurring, and equally |
| as unfortunate the time to resolve each is different. |
| The first, no connection, may require up to a second before a message can be accepted, while a rejection |
| because of buffer shortage is likely to resolve in less than a millisecond. |
| |
| &h2(Application Response) |
| The action which an application takes when a soft failure is reported ultimately depends on the nature |
| of the application with respect to factors such as tolerance to extended message latency, dropped messages, |
| and over all message rate. |
| |
| &h2(RMR Retry Modes) |
| In an effort to reduce the workload of an application developer, RMR has a default retry policy such that |
| RMR will attempt to retransmit a message up to 1000 times when a soft failure is reported. |
| These retries generally take less than 1 millisecond (if all 1000 are attempted) and in most cases eliminates |
| nearly all reported soft failures to the application. |
| When using this mode, it might allow the application to simply treat all bad return values from a send attempt |
| as permanent failures. |
| &space |
| |
| If an application is so sensitive to any delay in RMR, or the underlying transport mechanism, it is possible to |
| set RMR to return a failure immediately on any kind of error (permanent failures are always reported without retry). |
| In this mode, RMR will still set the state in the message buffer to &cw(RMR_ERR_RETRY,) but will &bold(not) |
| make any attempts to resend the message. |
| This zero-retry policy is enabled by invoking the &func(rmr_set_stimeout) with a value of 0; this can be done once |
| immediately after &func(rmr_init:) is invoked. |
| |
| &space |
| Regardless of the retry mode which the application sets, it will ultimately be up to the application to |
| handle failures by queuing the message internally for resend, retrying immediately, or dropping the |
| send attempt all together. |
| As stated before, only the application can determine how to best handle send failures. |
| |
| |
| &h2(Other Failures) |
| RMR will return the state of processing for message based operations (send/receive) as the status in |
| the message buffer. |
| For non-message operations, state is returned to the caller as the integer return value for all functions |
| which are not expected to return a pointer (e.g. &func(rmr_init:).) |
| The following are the RMR state constants and a brief description of their meaning. |
| |
| &space |
| .st 8p |
| &indent |
| &beg_dlist( 1.5i &ditext ) |
| &di(RMR_OK) state is good; operation finished successfully |
| &half_space |
| |
| &di(RMR_ERR_BADARG) argument passed to function was unusable |
| &half_space |
| |
| &di(RMR_ERR_NOENDPT) send/call could not find an endpoint based on msg type |
| &half_space |
| |
| &di(RMR_ERR_EMPTY) msg received had no payload; attempt to send an empty message |
| &half_space |
| |
| &di(RMR_ERR_NOHDR) message didn't contain a valid header |
| &half_space |
| |
| &di(RMR_ERR_SENDFAILED) send failed; errno may contain the transport provider reason |
| &half_space |
| |
| &di(RMR_ERR_CALLFAILED) unable to send the message for a call function; errno may contain the transport provider reason |
| &half_space |
| |
| &di(RMR_ERR_NOWHOPEN) no wormholes are open |
| &half_space |
| |
| &di(RMR_ERR_WHID) the wormhole id provided was invalid |
| &half_space |
| |
| &di(RMR_ERR_OVERFLOW) operation would have busted through a buffer/field size |
| &half_space |
| |
| &di(RMR_ERR_RETRY) request (send/call/rts) failed, but caller should retry (EAGAIN for wrappers) |
| &half_space |
| |
| &di(RMR_ERR_RCVFAILED) receive failed (hard error) |
| &half_space |
| |
| &di(RMR_ERR_TIMEOUT) response message not received in a reasonable amount of time |
| &half_space |
| |
| &di(RMR_ERR_UNSET) the message hasn't been populated with a transport buffer |
| &half_space |
| |
| &di(RMR_ERR_TRUNC) length in the received buffer is longer than the size of the allocated payload, |
| received message likely truncated (length set by sender could be wrong, but we can't know that) |
| &half_space |
| |
| &di(RMR_ERR_INITFAILED) initialisation of something (probably message) failed |
| &half_space |
| |
| &di(RMR_ERR_NOTSUPP) the request is not supported, or RMr was not initialised for the request |
| &end_dlist |
| &uindent |
| .st &textsize |
| &space |
| |
| Depending on the underlying transport mechanism, and the nature of the call that RMR attempted, the |
| system &cw(errno) value might reflect additional detail about the failure. |
| Applications should &bold(not) rely on errno as some transport mechanisms do not set it with |
| any consistency. |