bond: performance harvesting

- hash is great. But it is a bit too slow for the DP. Use direct array indexing
to quickly retrieve the slave interface.
- the algorithm used by flow hash is great. But it is a bit too slow for the DP.
Use l2_hash_hash() extracted from lb_hash.h which ECMP is using. It makes use
of intrinsic crc32 instruction set.
- shortcut modulo arithmetic when the operand is 2**x (where x up to 4) to
avoid division instruction.
- special case for link count == 1 in bond_tx_fn()
- use clib_mem_unaligned to access data for the packet to avoid alignment error
- Fix some typos for packet tracing.

Change-Id: I8eae3ad497061c5473aa675ba894ee0211120d25
Signed-off-by: Steven <sluong@cisco.com>
diff --git a/src/vppinfra.am b/src/vppinfra.am
index 6555528..ec271e6 100644
--- a/src/vppinfra.am
+++ b/src/vppinfra.am
@@ -201,6 +201,7 @@
   vppinfra/clib_error.h \
   vppinfra/cpu.h \
   vppinfra/crc32.h \
+  vppinfra/lb_hash_hash.h \
   vppinfra/dlist.h \
   vppinfra/elf.h \
   vppinfra/elf_clib.h \