E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 1 | .if false |
| 2 | ================================================================================== |
| 3 | Copyright (c) 2019 Nokia |
| 4 | Copyright (c) 2018-2019 AT&T Intellectual Property. |
| 5 | |
| 6 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 7 | you may not use this file except in compliance with the License. |
| 8 | You may obtain a copy of the License at |
| 9 | |
| 10 | http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | |
| 12 | Unless required by applicable law or agreed to in writing, software |
| 13 | distributed under the License is distributed on an "AS IS" BASIS, |
| 14 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 15 | See the License for the specific language governing permissions and |
| 16 | limitations under the License. |
| 17 | ================================================================================== |
| 18 | .fi |
| 19 | |
| 20 | .if false |
| 21 | Mnemonic: failures.im |
| 22 | Abstract: This is the major section on how an application might handle failures |
| 23 | Date: 2 August 2019 |
| 24 | Author: E. Scott Daniels |
| 25 | .fi |
| 26 | |
| 27 | &h1(Handling Failures) |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 28 | The vast majority of states reported by RMR are fatal; if encountered |
| 29 | during setup or initialization, then it is unlikely that any message |
| 30 | oriented processing should continue, and when encountered on a message |
| 31 | operation continued operation on that message should be abandoned. |
| 32 | Specifically with regard to message sending, it is very likely that |
| 33 | the underlying transport mechanism will report a &ital(soft,) or |
| 34 | transient, failure which might be successful if the operation is |
| 35 | retried at a later point in time. The paragraphs below discuss the |
| 36 | methods that an application might deal with these soft failures. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 37 | |
| 38 | &h2(Failure Notification) |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 39 | When a soft failure is reported, the returned message buffer returned |
| 40 | by the RMR function will be &cw(RMR_ERR_RETRY.) These types of |
| 41 | failures can occur for various reasons; one of two reasons is |
| 42 | typically the underlying cause: |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 43 | |
| 44 | &half_space |
| 45 | &indent |
| 46 | &beg_list( &lic1 ) |
| 47 | &li The session to the targeted recipient (endpoint) is not connected. |
| 48 | &half_space |
| 49 | |
| 50 | &li The transport mechanism buffer pool is full and cannot accept another buffer. |
| 51 | &half_space |
| 52 | &end_list |
| 53 | &uindent |
| 54 | &space |
| 55 | |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 56 | Unfortunately, it is not possible for RMR to determine which of these |
| 57 | two cases is occurring, and equally as unfortunate the time to resolve |
| 58 | each is different. The first, no connection, may require up to a |
| 59 | second before a message can be accepted, while a rejection because of |
| 60 | buffer shortage is likely to resolve in less than a millisecond. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 61 | |
| 62 | &h2(Application Response) |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 63 | The action which an application takes when a soft failure is reported |
| 64 | ultimately depends on the nature of the application with respect to |
| 65 | factors such as tolerance to extended message latency, dropped |
| 66 | messages, and over all message rate. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 67 | |
| 68 | &h2(RMR Retry Modes) |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 69 | In an effort to reduce the workload of an application developer, RMR |
| 70 | has a default retry policy such that RMR will attempt to retransmit a |
| 71 | message up to 1000 times when a soft failure is reported. These |
| 72 | retries generally take less than 1 millisecond (if all 1000 are |
| 73 | attempted) and in most cases eliminates nearly all reported soft |
| 74 | failures to the application. When using this mode, it might allow the |
| 75 | application to simply treat all bad return values from a send attempt |
| 76 | as permanent failures. &space |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 77 | |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 78 | If an application is so sensitive to any delay in RMR, or the |
| 79 | underlying transport mechanism, it is possible to set RMR to return a |
| 80 | failure immediately on any kind of error (permanent failures are |
| 81 | always reported without retry). In this mode, RMR will still set the |
| 82 | state in the message buffer to &cw(RMR_ERR_RETRY,) but will &bold(not) |
| 83 | make any attempts to resend the message. This zero-retry policy is |
| 84 | enabled by invoking the &func(rmr_set_stimeout) with a value of 0; |
| 85 | this can be done once immediately after &func(rmr_init:) is invoked. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 86 | |
| 87 | &space |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 88 | Regardless of the retry mode which the application sets, it will |
| 89 | ultimately be up to the application to handle failures by queuing the |
| 90 | message internally for resend, retrying immediately, or dropping the |
| 91 | send attempt all together. As stated before, only the application can |
| 92 | determine how to best handle send failures. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 93 | |
| 94 | |
| 95 | &h2(Other Failures) |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 96 | RMR will return the state of processing for message based operations |
| 97 | (send/receive) as the status in the message buffer. For non-message |
| 98 | operations, state is returned to the caller as the integer return |
| 99 | value for all functions which are not expected to return a pointer |
| 100 | (e.g. &func(rmr_init:).) The following are the RMR state constants |
| 101 | and a brief description of their meaning. |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 102 | |
| 103 | &space |
| 104 | .st 8p |
| 105 | &indent |
| 106 | &beg_dlist( 1.5i &ditext ) |
| 107 | &di(RMR_OK) state is good; operation finished successfully |
| 108 | &half_space |
| 109 | |
| 110 | &di(RMR_ERR_BADARG) argument passed to function was unusable |
| 111 | &half_space |
| 112 | |
| 113 | &di(RMR_ERR_NOENDPT) send/call could not find an endpoint based on msg type |
| 114 | &half_space |
| 115 | |
| 116 | &di(RMR_ERR_EMPTY) msg received had no payload; attempt to send an empty message |
| 117 | &half_space |
| 118 | |
| 119 | &di(RMR_ERR_NOHDR) message didn't contain a valid header |
| 120 | &half_space |
| 121 | |
| 122 | &di(RMR_ERR_SENDFAILED) send failed; errno may contain the transport provider reason |
| 123 | &half_space |
| 124 | |
| 125 | &di(RMR_ERR_CALLFAILED) unable to send the message for a call function; errno may contain the transport provider reason |
| 126 | &half_space |
| 127 | |
| 128 | &di(RMR_ERR_NOWHOPEN) no wormholes are open |
| 129 | &half_space |
| 130 | |
| 131 | &di(RMR_ERR_WHID) the wormhole id provided was invalid |
| 132 | &half_space |
| 133 | |
| 134 | &di(RMR_ERR_OVERFLOW) operation would have busted through a buffer/field size |
| 135 | &half_space |
| 136 | |
| 137 | &di(RMR_ERR_RETRY) request (send/call/rts) failed, but caller should retry (EAGAIN for wrappers) |
| 138 | &half_space |
| 139 | |
| 140 | &di(RMR_ERR_RCVFAILED) receive failed (hard error) |
| 141 | &half_space |
| 142 | |
| 143 | &di(RMR_ERR_TIMEOUT) response message not received in a reasonable amount of time |
| 144 | &half_space |
| 145 | |
| 146 | &di(RMR_ERR_UNSET) the message hasn't been populated with a transport buffer |
| 147 | &half_space |
| 148 | |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 149 | &di(RMR_ERR_TRUNC) length in the received buffer is longer than the size of the allocated payload, |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 150 | received message likely truncated (length set by sender could be wrong, but we can't know that) |
| 151 | &half_space |
| 152 | |
| 153 | &di(RMR_ERR_INITFAILED) initialisation of something (probably message) failed |
| 154 | &half_space |
| 155 | |
E. Scott Daniels | 117030c | 2020-04-10 17:17:02 -0400 | [diff] [blame] | 156 | &di(RMR_ERR_NOTSUPP) the request is not supported, or RMR was not initialised for the request |
E. Scott Daniels | 06e85b7 | 2019-08-06 16:29:00 -0400 | [diff] [blame] | 157 | &end_dlist |
| 158 | &uindent |
| 159 | .st &textsize |
| 160 | &space |
| 161 | |
Lott, Christopher (cl778h) | fe6a856 | 2020-04-06 15:05:22 -0400 | [diff] [blame] | 162 | Depending on the underlying transport mechanism, and the nature of the |
| 163 | call that RMR attempted, the system &cw(errno) value might reflect |
| 164 | additional detail about the failure. Applications should &bold(not) |
| 165 | rely on errno as some transport mechanisms do not set it with any |
| 166 | consistency. |