svm_fifo rework to avoid contention on cursize

Problems Addressed:
- Contention of cursize by producer and consumer.
- Reduce the no of modulo operations.

Changes:
- Synchronization between producer and consumer changed from cursize
  to head and tail indexes
  Implications: reduces the usable size of fifo by 1.
- Using weaker memory ordering C++11 atomics to access head and tail
  based on producer and consumer role.
- Head and tail indexes are unsigned 32 bit integers. Additions and
  subtraction on them are implicit 32 bit Modulo operation.
- Adding weaker memory ordering variants of max_enq, max_deq, is_empty
  and is_full Using them appropriately in all places.

Perfomance improvement (iperf3 via Hoststack):

iperf3 Server: Marvell ThunderX2(AArch64) - iperf3 Client: Skylake(x86)
   ~6%(256 rxd/txd) - ~11%(2048 rxd/txd)

Change-Id: I1d484e000e437430fdd5a819657d1c6b62443018
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
diff --git a/src/vnet/session-apps/echo_server.c b/src/vnet/session-apps/echo_server.c
index d165fb3..4249ed8 100644
--- a/src/vnet/session-apps/echo_server.c
+++ b/src/vnet/session-apps/echo_server.c
@@ -139,7 +139,7 @@
 echo_server_builtin_server_rx_callback_no_echo (session_t * s)
 {
   svm_fifo_t *rx_fifo = s->rx_fifo;
-  svm_fifo_dequeue_drop (rx_fifo, svm_fifo_max_dequeue (rx_fifo));
+  svm_fifo_dequeue_drop (rx_fifo, svm_fifo_max_dequeue_cons (rx_fifo));
   return 0;
 }
 
@@ -161,10 +161,10 @@
   ASSERT (rx_fifo->master_thread_index == thread_index);
   ASSERT (tx_fifo->master_thread_index == thread_index);
 
-  max_enqueue = svm_fifo_max_enqueue (tx_fifo);
+  max_enqueue = svm_fifo_max_enqueue_prod (tx_fifo);
   if (!esm->is_dgram)
     {
-      max_dequeue = svm_fifo_max_dequeue (rx_fifo);
+      max_dequeue = svm_fifo_max_dequeue_cons (rx_fifo);
     }
   else
     {
@@ -255,7 +255,7 @@
   if (n_written != max_transfer)
     clib_warning ("short trout! written %u read %u", n_written, max_transfer);
 
-  if (PREDICT_FALSE (svm_fifo_max_dequeue (rx_fifo)))
+  if (PREDICT_FALSE (svm_fifo_max_dequeue_cons (rx_fifo)))
     goto rx_event;
 
   return 0;