E. Scott Daniels | f7d4457 | 2019-05-16 17:04:34 +0000 | [diff] [blame] | 1 | |
| 2 | In general, seeing a "PASS" from the sender(s) and receiver(s) for each execution |
| 3 | is a good indication that all was successful. Reeceivers will fail if the |
| 4 | simple checksum calculated for the payload and trace data doesn't match. Senders |
| 5 | will fail if a returned message doesn't have its matching tag (meaning it was |
| 6 | returned to the wrong sender). Both will error on a timeout either no route |
| 7 | information, or receiver did not receive the expected number of messages. |
| 8 | |
| 9 | Receivers send an 'ack' for message type 5, so for some tests the number of ack |
| 10 | messages sent will not be the same as the number of messages received. Senders |
| 11 | loop through message types 0-9 inclusive, unless otherwise directed on the |
| 12 | command line (e.g. the rts test sends nothing but message type 5 messages so that |
| 13 | all messages are ack'd). |
| 14 | |
| 15 | Receivers will generate a final histogram of message types received. For example |
| 16 | |
| 17 | <RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0 |
| 18 | |
| 19 | is generated for the rts test -- all messages are type 5 and thus all other message |
| 20 | type bins should be 0. |
| 21 | |
| 22 | By default, senders send 10 messages at a rate of about 1/sec. Receivers give up |
| 23 | after 20 seconds, so even though the rate and number of messages sent can be |
| 24 | adjusted from the command line, if the combination is such that the total number |
| 25 | of messages sent requires more than 20 seconds to send the tests will fail. |
| 26 | |
| 27 | Specific examples |
| 28 | The output is chopped to the last few lines. |
| 29 | |
| 30 | Return to sender test with 20 senders sending 5K messages each: |
| 31 | ksh run_rts_test.ksh -s 20 -d 180 -n 5000 |
| 32 | |
| 33 | |
| 34 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4 |
| 35 | <RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0 |
| 36 | <RCVR> [PASS] 100000 messages; good=100000 acked=99983 bad=0 bad-trace=0 bad-sub_id=0 |
| 37 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4 |
| 38 | <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2 |
| 39 | <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2 |
| 40 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4 |
| 41 | <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2 |
| 42 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4 |
| 43 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2 |
| 44 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4 |
| 45 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2 |
| 46 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4 |
| 47 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=5 |
| 48 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=1 |
| 49 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4 |
| 50 | <SNDR> [PASS] sent=5000 rcvd=4997 rts-ok=4997 failures=0 retries=2 |
| 51 | <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2 |
| 52 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2 |
| 53 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=3 |
| 54 | <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=1 |
| 55 | <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2 |
| 56 | [PASS] sender rc=0 receiver rc=0 |
| 57 | |
| 58 | Important notes |
| 59 | + The receiver will only retry acks for a finite number of tries before |
| 60 | giving up, thus the total acs sent may still be less than messages |
| 61 | received. As a cross validation, the total acks sent by the receiver |
| 62 | should match the recvd count sum over all senders. |
| 63 | |
| 64 | + The recvd and rts-ok counts for each sender should match. If they don't |
| 65 | the receiver should mark the overall state as a failure as this indicates |
| 66 | that a return to sender message was returned to the wrong place. |
| 67 | |
| 68 | |
| 69 | |
| 70 | Multiple Receiver test |
| 71 | Test run with 10 receivers and sender sending 10K messages. The histograms |
| 72 | and status messages were reorganised for easier reading here. |
| 73 | |
| 74 | ksh run_multi_test.ksh -r 10 -d 180 -n 10000 |
| 75 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 76 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 77 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 78 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 79 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 80 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 81 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 82 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 83 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 84 | <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0 |
| 85 | |
| 86 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 87 | <SNDR> [PASS] sent=10000 rcvd=10000 rts-ok=10000 failures=0 retries=0 |
| 88 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 89 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 90 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 91 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 92 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 93 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 94 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 95 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 96 | <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0 |
| 97 | [PASS] sender rc=0 receiver rc=0 |
| 98 | |
| 99 | Important notes: |
| 100 | + histograms should show messages for all types, except type 10 which are never sent. |
| 101 | |
| 102 | + sender should receive only 1/10th of the number of messages sent back as acks; |
| 103 | modulo receiver giving up on an ack retry, so as before the sum of ack counts should |
| 104 | match the sender's received count. |
| 105 | |
| 106 | + sender should fail if the received count does not match the rts-ok count indicating |
| 107 | that a return to sender was sent to the wrong spot (very unlikely here as there is |
| 108 | only one sender). |
| 109 | |
| 110 | |
| 111 | |
| 112 | Retries |
| 113 | The retries counter for a sender is the number of times that a retry send loop had to be |
| 114 | entered in order to successfully send a message. The sender will never give up on a send |
| 115 | attempt, but retrying will affect latency of that message. A count of less than 10/10000 |
| 116 | messages is good, but it also depends on the rate that the sender is attempting. The |
| 117 | higher the rate, the more likely the need to retry, and thus the higher this counter will |
| 118 | be. |