blob: 27949a9be48b58cc8e1e4eb221b244586deb7e11 [file] [log] [blame]
E. Scott Daniels06e85b72019-08-06 16:29:00 -04001.if false
2==================================================================================
3 Copyright (c) 2019 Nokia
4 Copyright (c) 2018-2019 AT&T Intellectual Property.
5
6 Licensed under the Apache License, Version 2.0 (the "License");
7 you may not use this file except in compliance with the License.
8 You may obtain a copy of the License at
9
10 http://www.apache.org/licenses/LICENSE-2.0
11
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an "AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License for the specific language governing permissions and
16 limitations under the License.
17==================================================================================
18.fi
19
20.if false
21 Mnemonic: failures.im
22 Abstract: This is the major section on how an application might handle failures
23 Date: 2 August 2019
24 Author: E. Scott Daniels
25.fi
26
27&h1(Handling Failures)
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040028The vast majority of states reported by RMR are fatal; if encountered
29during setup or initialization, then it is unlikely that any message
30oriented processing should continue, and when encountered on a message
31operation continued operation on that message should be abandoned.
32Specifically with regard to message sending, it is very likely that
33the underlying transport mechanism will report a &ital(soft,) or
34transient, failure which might be successful if the operation is
35retried at a later point in time. The paragraphs below discuss the
36methods that an application might deal with these soft failures.
E. Scott Daniels06e85b72019-08-06 16:29:00 -040037
38&h2(Failure Notification)
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040039When a soft failure is reported, the returned message buffer returned
40by the RMR function will be &cw(RMR_ERR_RETRY.) These types of
41failures can occur for various reasons; one of two reasons is
42typically the underlying cause:
E. Scott Daniels06e85b72019-08-06 16:29:00 -040043
44&half_space
45&indent
46&beg_list( &lic1 )
47 &li The session to the targeted recipient (endpoint) is not connected.
48 &half_space
49
50 &li The transport mechanism buffer pool is full and cannot accept another buffer.
51 &half_space
52&end_list
53&uindent
54&space
55
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040056Unfortunately, it is not possible for RMR to determine which of these
57two cases is occurring, and equally as unfortunate the time to resolve
58each is different. The first, no connection, may require up to a
59second before a message can be accepted, while a rejection because of
60buffer shortage is likely to resolve in less than a millisecond.
E. Scott Daniels06e85b72019-08-06 16:29:00 -040061
62&h2(Application Response)
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040063The action which an application takes when a soft failure is reported
64ultimately depends on the nature of the application with respect to
65factors such as tolerance to extended message latency, dropped
66messages, and over all message rate.
E. Scott Daniels06e85b72019-08-06 16:29:00 -040067
68&h2(RMR Retry Modes)
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040069In an effort to reduce the workload of an application developer, RMR
70has a default retry policy such that RMR will attempt to retransmit a
71message up to 1000 times when a soft failure is reported. These
72retries generally take less than 1 millisecond (if all 1000 are
73attempted) and in most cases eliminates nearly all reported soft
74failures to the application. When using this mode, it might allow the
75application to simply treat all bad return values from a send attempt
76as permanent failures. &space
E. Scott Daniels06e85b72019-08-06 16:29:00 -040077
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040078If an application is so sensitive to any delay in RMR, or the
79underlying transport mechanism, it is possible to set RMR to return a
80failure immediately on any kind of error (permanent failures are
81always reported without retry). In this mode, RMR will still set the
82state in the message buffer to &cw(RMR_ERR_RETRY,) but will &bold(not)
83make any attempts to resend the message. This zero-retry policy is
84enabled by invoking the &func(rmr_set_stimeout) with a value of 0;
85this can be done once immediately after &func(rmr_init:) is invoked.
E. Scott Daniels06e85b72019-08-06 16:29:00 -040086
87&space
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040088Regardless of the retry mode which the application sets, it will
89ultimately be up to the application to handle failures by queuing the
90message internally for resend, retrying immediately, or dropping the
91send attempt all together. As stated before, only the application can
92determine how to best handle send failures.
E. Scott Daniels06e85b72019-08-06 16:29:00 -040093
94
95&h2(Other Failures)
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -040096RMR will return the state of processing for message based operations
97(send/receive) as the status in the message buffer. For non-message
98operations, state is returned to the caller as the integer return
99value for all functions which are not expected to return a pointer
100(e.g. &func(rmr_init:).) The following are the RMR state constants
101and a brief description of their meaning.
E. Scott Daniels06e85b72019-08-06 16:29:00 -0400102
103&space
104.st 8p
105&indent
106&beg_dlist( 1.5i &ditext )
107 &di(RMR_OK) state is good; operation finished successfully
108 &half_space
109
110 &di(RMR_ERR_BADARG) argument passed to function was unusable
111 &half_space
112
113 &di(RMR_ERR_NOENDPT) send/call could not find an endpoint based on msg type
114 &half_space
115
116 &di(RMR_ERR_EMPTY) msg received had no payload; attempt to send an empty message
117 &half_space
118
119 &di(RMR_ERR_NOHDR) message didn't contain a valid header
120 &half_space
121
122 &di(RMR_ERR_SENDFAILED) send failed; errno may contain the transport provider reason
123 &half_space
124
125 &di(RMR_ERR_CALLFAILED) unable to send the message for a call function; errno may contain the transport provider reason
126 &half_space
127
128 &di(RMR_ERR_NOWHOPEN) no wormholes are open
129 &half_space
130
131 &di(RMR_ERR_WHID) the wormhole id provided was invalid
132 &half_space
133
134 &di(RMR_ERR_OVERFLOW) operation would have busted through a buffer/field size
135 &half_space
136
137 &di(RMR_ERR_RETRY) request (send/call/rts) failed, but caller should retry (EAGAIN for wrappers)
138 &half_space
139
140 &di(RMR_ERR_RCVFAILED) receive failed (hard error)
141 &half_space
142
143 &di(RMR_ERR_TIMEOUT) response message not received in a reasonable amount of time
144 &half_space
145
146 &di(RMR_ERR_UNSET) the message hasn't been populated with a transport buffer
147 &half_space
148
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -0400149 &di(RMR_ERR_TRUNC) length in the received buffer is longer than the size of the allocated payload,
E. Scott Daniels06e85b72019-08-06 16:29:00 -0400150 received message likely truncated (length set by sender could be wrong, but we can't know that)
151 &half_space
152
153 &di(RMR_ERR_INITFAILED) initialisation of something (probably message) failed
154 &half_space
155
E. Scott Daniels117030c2020-04-10 17:17:02 -0400156 &di(RMR_ERR_NOTSUPP) the request is not supported, or RMR was not initialised for the request
E. Scott Daniels06e85b72019-08-06 16:29:00 -0400157&end_dlist
158&uindent
159.st &textsize
160&space
161
Lott, Christopher (cl778h)fe6a8562020-04-06 15:05:22 -0400162Depending on the underlying transport mechanism, and the nature of the
163call that RMR attempted, the system &cw(errno) value might reflect
164additional detail about the failure. Applications should &bold(not)
165rely on errno as some transport mechanisms do not set it with any
166consistency.