blob: a3b43e9087ae90a9bc04940b105ce13a10ab042a [file] [log] [blame]
E. Scott Danielsf7d44572019-05-16 17:04:34 +00001
2In general, seeing a "PASS" from the sender(s) and receiver(s) for each execution
3is a good indication that all was successful. Reeceivers will fail if the
4simple checksum calculated for the payload and trace data doesn't match. Senders
5will fail if a returned message doesn't have its matching tag (meaning it was
6returned to the wrong sender). Both will error on a timeout either no route
7information, or receiver did not receive the expected number of messages.
8
9Receivers send an 'ack' for message type 5, so for some tests the number of ack
10messages sent will not be the same as the number of messages received. Senders
11loop through message types 0-9 inclusive, unless otherwise directed on the
12command line (e.g. the rts test sends nothing but message type 5 messages so that
13all messages are ack'd).
14
15Receivers will generate a final histogram of message types received. For example
16
17<RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
18
19is generated for the rts test -- all messages are type 5 and thus all other message
20type bins should be 0.
21
22By default, senders send 10 messages at a rate of about 1/sec. Receivers give up
23after 20 seconds, so even though the rate and number of messages sent can be
24adjusted from the command line, if the combination is such that the total number
25of messages sent requires more than 20 seconds to send the tests will fail.
26
27Specific examples
28The output is chopped to the last few lines.
29
30Return to sender test with 20 senders sending 5K messages each:
31 ksh run_rts_test.ksh -s 20 -d 180 -n 5000
32
33
34 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
35 <RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
36 <RCVR> [PASS] 100000 messages; good=100000 acked=99983 bad=0 bad-trace=0 bad-sub_id=0
37 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
38 <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
39 <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
40 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
41 <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
42 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
43 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
44 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
45 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
46 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
47 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=5
48 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=1
49 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
50 <SNDR> [PASS] sent=5000 rcvd=4997 rts-ok=4997 failures=0 retries=2
51 <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
52 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
53 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=3
54 <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=1
55 <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
56 [PASS] sender rc=0 receiver rc=0
57
58Important notes
59 + The receiver will only retry acks for a finite number of tries before
60 giving up, thus the total acs sent may still be less than messages
61 received. As a cross validation, the total acks sent by the receiver
62 should match the recvd count sum over all senders.
63
64 + The recvd and rts-ok counts for each sender should match. If they don't
65 the receiver should mark the overall state as a failure as this indicates
66 that a return to sender message was returned to the wrong place.
67
68
69
70Multiple Receiver test
71Test run with 10 receivers and sender sending 10K messages. The histograms
72and status messages were reorganised for easier reading here.
73
74 ksh run_multi_test.ksh -r 10 -d 180 -n 10000
75 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
76 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
77 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
78 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
79 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
80 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
81 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
82 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
83 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
84 <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
85
86 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
87 <SNDR> [PASS] sent=10000 rcvd=10000 rts-ok=10000 failures=0 retries=0
88 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
89 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
90 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
91 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
92 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
93 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
94 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
95 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
96 <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
97 [PASS] sender rc=0 receiver rc=0
98
99Important notes:
100 + histograms should show messages for all types, except type 10 which are never sent.
101
102 + sender should receive only 1/10th of the number of messages sent back as acks;
103 modulo receiver giving up on an ack retry, so as before the sum of ack counts should
104 match the sender's received count.
105
106 + sender should fail if the received count does not match the rts-ok count indicating
107 that a return to sender was sent to the wrong spot (very unlikely here as there is
108 only one sender).
109
110
111
112Retries
113The retries counter for a sender is the number of times that a retry send loop had to be
114entered in order to successfully send a message. The sender will never give up on a send
115attempt, but retrying will affect latency of that message. A count of less than 10/10000
116messages is good, but it also depends on the rate that the sender is attempting. The
117higher the rate, the more likely the need to retry, and thus the higher this counter will
118be.