blob: a3b43e9087ae90a9bc04940b105ce13a10ab042a [file] [log] [blame]
In general, seeing a "PASS" from the sender(s) and receiver(s) for each execution
is a good indication that all was successful. Reeceivers will fail if the
simple checksum calculated for the payload and trace data doesn't match. Senders
will fail if a returned message doesn't have its matching tag (meaning it was
returned to the wrong sender). Both will error on a timeout either no route
information, or receiver did not receive the expected number of messages.
Receivers send an 'ack' for message type 5, so for some tests the number of ack
messages sent will not be the same as the number of messages received. Senders
loop through message types 0-9 inclusive, unless otherwise directed on the
command line (e.g. the rts test sends nothing but message type 5 messages so that
all messages are ack'd).
Receivers will generate a final histogram of message types received. For example
<RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
is generated for the rts test -- all messages are type 5 and thus all other message
type bins should be 0.
By default, senders send 10 messages at a rate of about 1/sec. Receivers give up
after 20 seconds, so even though the rate and number of messages sent can be
adjusted from the command line, if the combination is such that the total number
of messages sent requires more than 20 seconds to send the tests will fail.
Specific examples
The output is chopped to the last few lines.
Return to sender test with 20 senders sending 5K messages each:
ksh run_rts_test.ksh -s 20 -d 180 -n 5000
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
<RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
<RCVR> [PASS] 100000 messages; good=100000 acked=99983 bad=0 bad-trace=0 bad-sub_id=0
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=5
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=1
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
<SNDR> [PASS] sent=5000 rcvd=4997 rts-ok=4997 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=3
<SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=1
<SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
[PASS] sender rc=0 receiver rc=0
Important notes
+ The receiver will only retry acks for a finite number of tries before
giving up, thus the total acs sent may still be less than messages
received. As a cross validation, the total acks sent by the receiver
should match the recvd count sum over all senders.
+ The recvd and rts-ok counts for each sender should match. If they don't
the receiver should mark the overall state as a failure as this indicates
that a return to sender message was returned to the wrong place.
Multiple Receiver test
Test run with 10 receivers and sender sending 10K messages. The histograms
and status messages were reorganised for easier reading here.
ksh run_multi_test.ksh -r 10 -d 180 -n 10000
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<SNDR> [PASS] sent=10000 rcvd=10000 rts-ok=10000 failures=0 retries=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
<RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
[PASS] sender rc=0 receiver rc=0
Important notes:
+ histograms should show messages for all types, except type 10 which are never sent.
+ sender should receive only 1/10th of the number of messages sent back as acks;
modulo receiver giving up on an ack retry, so as before the sum of ack counts should
match the sender's received count.
+ sender should fail if the received count does not match the rts-ok count indicating
that a return to sender was sent to the wrong spot (very unlikely here as there is
only one sender).
Retries
The retries counter for a sender is the number of times that a retry send loop had to be
entered in order to successfully send a message. The sender will never give up on a send
attempt, but retrying will affect latency of that message. A count of less than 10/10000
messages is good, but it also depends on the rate that the sender is attempting. The
higher the rate, the more likely the need to retry, and thus the higher this counter will
be.