svm_fifo rework to avoid contention on cursize

Problems Addressed:
- Contention of cursize by producer and consumer.
- Reduce the no of modulo operations.

Changes:
- Synchronization between producer and consumer changed from cursize
  to head and tail indexes
  Implications: reduces the usable size of fifo by 1.
- Using weaker memory ordering C++11 atomics to access head and tail
  based on producer and consumer role.
- Head and tail indexes are unsigned 32 bit integers. Additions and
  subtraction on them are implicit 32 bit Modulo operation.
- Adding weaker memory ordering variants of max_enq, max_deq, is_empty
  and is_full Using them appropriately in all places.

Perfomance improvement (iperf3 via Hoststack):

iperf3 Server: Marvell ThunderX2(AArch64) - iperf3 Client: Skylake(x86)
   ~6%(256 rxd/txd) - ~11%(2048 rxd/txd)

Change-Id: I1d484e000e437430fdd5a819657d1c6b62443018
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
diff --git a/src/tests/vnet/session/tcp_echo.c b/src/tests/vnet/session/tcp_echo.c
index d76e939..99b6765 100644
--- a/src/tests/vnet/session/tcp_echo.c
+++ b/src/tests/vnet/session/tcp_echo.c
@@ -507,7 +507,7 @@
 {
   int n_to_read, n_read;
 
-  n_to_read = svm_fifo_max_dequeue (s->rx_fifo);
+  n_to_read = svm_fifo_max_dequeue_cons (s->rx_fifo);
   if (!n_to_read)
     return;
 
@@ -1063,7 +1063,7 @@
    * message queue */
   svm_fifo_unset_event (s->rx_fifo);
 
-  max_dequeue = svm_fifo_max_dequeue (s->rx_fifo);
+  max_dequeue = svm_fifo_max_dequeue_cons (s->rx_fifo);
   if (PREDICT_FALSE (!max_dequeue))
     return;
 
diff --git a/src/tests/vnet/session/udp_echo.c b/src/tests/vnet/session/udp_echo.c
index 4fd6c86..91d5dcc 100644
--- a/src/tests/vnet/session/udp_echo.c
+++ b/src/tests/vnet/session/udp_echo.c
@@ -454,7 +454,7 @@
 	  else
 	    {
 	      /* We don't do anything with the data, drop it */
-	      actual_transfer = svm_fifo_max_dequeue (rx_fifo);
+	      actual_transfer = svm_fifo_max_dequeue_cons (rx_fifo);
 	      svm_fifo_dequeue_drop (rx_fifo, actual_transfer);
 	    }
 	}
@@ -724,7 +724,7 @@
   test_buf_offset = utm->bytes_sent % test_buf_len;
   bytes_this_chunk = clib_min (test_buf_len - test_buf_offset,
 			       utm->bytes_to_send);
-  enq_space = svm_fifo_max_enqueue (s->tx_fifo);
+  enq_space = svm_fifo_max_enqueue_prod (s->tx_fifo);
   bytes_this_chunk = clib_min (bytes_this_chunk, enq_space);
 
   written = app_send (s, test_data + test_buf_offset, bytes_this_chunk,
@@ -975,7 +975,8 @@
   rx_fifo = session->rx_fifo;
   tx_fifo = session->tx_fifo;
 
-  max_dequeue = svm_fifo_max_dequeue (rx_fifo);
+
+  max_dequeue = svm_fifo_max_dequeue_cons (rx_fifo);
   /* Allow enqueuing of a new event */
   svm_fifo_unset_event (rx_fifo);