test(e2e): Add return to sender app based test
Change-Id: Idb460369c4210a614dfba961efa3a8df8f5ee51a
Signed-off-by: E. Scott Daniels <daniels@research.att.com>
diff --git a/test/app_test/NOTES.txt b/test/app_test/NOTES.txt
new file mode 100644
index 0000000..a3b43e9
--- /dev/null
+++ b/test/app_test/NOTES.txt
@@ -0,0 +1,118 @@
+
+In general, seeing a "PASS" from the sender(s) and receiver(s) for each execution
+is a good indication that all was successful. Reeceivers will fail if the
+simple checksum calculated for the payload and trace data doesn't match. Senders
+will fail if a returned message doesn't have its matching tag (meaning it was
+returned to the wrong sender). Both will error on a timeout either no route
+information, or receiver did not receive the expected number of messages.
+
+Receivers send an 'ack' for message type 5, so for some tests the number of ack
+messages sent will not be the same as the number of messages received. Senders
+loop through message types 0-9 inclusive, unless otherwise directed on the
+command line (e.g. the rts test sends nothing but message type 5 messages so that
+all messages are ack'd).
+
+Receivers will generate a final histogram of message types received. For example
+
+<RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
+
+is generated for the rts test -- all messages are type 5 and thus all other message
+type bins should be 0.
+
+By default, senders send 10 messages at a rate of about 1/sec. Receivers give up
+after 20 seconds, so even though the rate and number of messages sent can be
+adjusted from the command line, if the combination is such that the total number
+of messages sent requires more than 20 seconds to send the tests will fail.
+
+Specific examples
+The output is chopped to the last few lines.
+
+Return to sender test with 20 senders sending 5K messages each:
+ ksh run_rts_test.ksh -s 20 -d 180 -n 5000
+
+
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
+ <RCVR> mtype histogram: 0 0 0 0 0 100000 0 0 0 0 0
+ <RCVR> [PASS] 100000 messages; good=100000 acked=99983 bad=0 bad-trace=0 bad-sub_id=0
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=5
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=1
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=4
+ <SNDR> [PASS] sent=5000 rcvd=4997 rts-ok=4997 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=4999 rts-ok=4999 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=2
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=3
+ <SNDR> [PASS] sent=5000 rcvd=5000 rts-ok=5000 failures=0 retries=1
+ <SNDR> [PASS] sent=5000 rcvd=4998 rts-ok=4998 failures=0 retries=2
+ [PASS] sender rc=0 receiver rc=0
+
+Important notes
+ + The receiver will only retry acks for a finite number of tries before
+ giving up, thus the total acs sent may still be less than messages
+ received. As a cross validation, the total acks sent by the receiver
+ should match the recvd count sum over all senders.
+
+ + The recvd and rts-ok counts for each sender should match. If they don't
+ the receiver should mark the overall state as a failure as this indicates
+ that a return to sender message was returned to the wrong place.
+
+
+
+Multiple Receiver test
+Test run with 10 receivers and sender sending 10K messages. The histograms
+and status messages were reorganised for easier reading here.
+
+ ksh run_multi_test.ksh -r 10 -d 180 -n 10000
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+ <RCVR> mtype histogram: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 0
+
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <SNDR> [PASS] sent=10000 rcvd=10000 rts-ok=10000 failures=0 retries=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ <RCVR> [PASS] 10000 messages; good=10000 acked=1000 bad=0 bad-trace=0 bad-sub_id=0
+ [PASS] sender rc=0 receiver rc=0
+
+Important notes:
+ + histograms should show messages for all types, except type 10 which are never sent.
+
+ + sender should receive only 1/10th of the number of messages sent back as acks;
+ modulo receiver giving up on an ack retry, so as before the sum of ack counts should
+ match the sender's received count.
+
+ + sender should fail if the received count does not match the rts-ok count indicating
+ that a return to sender was sent to the wrong spot (very unlikely here as there is
+ only one sender).
+
+
+
+Retries
+The retries counter for a sender is the number of times that a retry send loop had to be
+entered in order to successfully send a message. The sender will never give up on a send
+attempt, but retrying will affect latency of that message. A count of less than 10/10000
+messages is good, but it also depends on the rate that the sender is attempting. The
+higher the rate, the more likely the need to retry, and thus the higher this counter will
+be.