ipsec: Performance improvement of ipsec4_output_node using flow cache

Adding flow cache support to improve outbound IPv4/IPSec SPD lookup
performance. Details about flow cache:
  Mechanism:
  1. First packet of a flow will undergo linear search in SPD
     table. Once a policy match is found, a new entry will be added
     into the flow cache. From 2nd packet onwards, the policy lookup
     will happen in flow cache.
  2. The flow cache is implemented using bihash without collision
     handling. This will avoid the logic to age out or recycle the old
     flows in flow cache. Whenever a collision occurs, old entry will
     be overwritten by the new entry. Worst case is when all the 256
     packets in a batch result in collision and fall back to linear
     search. Average and best case will be O(1).
  3. The size of flow cache is fixed and decided based on the number
     of flows to be supported. The default is set to 1 million flows.
     This can be made as a configurable option as a next step.
  4. Whenever a SPD rule is added/deleted by the control plane, the
     flow cache entries will be completely deleted (reset) in the
     control plane. The assumption here is that SPD rule add/del is not
     a frequent operation from control plane. Flow cache reset is done,
     by putting the data plane in fall back mode, to bypass flow cache
     and do linear search till the SPD rule add/delete operation is
     complete. Once the rule is successfully added/deleted, the data
     plane will be allowed to make use of the flow cache. The flow
     cache will be reset only after flushing out the inflight packets
     from all the worker cores using
     vlib_worker_wait_one_loop().

  Details about bihash usage:
  1. A new bihash template (16_8) is added to support IPv4 5 tuple.
     BIHASH_KVP_PER_PAGE and BIHASH_KVP_AT_BUCKET_LEVEL are set
     to 1 in the new template. It means only one KVP is supported
     per bucket.
  2. Collision handling is avoided by calling
     BV (clib_bihash_add_or_overwrite_stale) function.
     Through the stale callback function pointer, the KVP entry
     will be overwritten during collision.
  3. Flow cache reset is done using
     BV (clib_bihash_foreach_key_value_pair) function.
     Through the callback function pointer, the KVP value is reset
     to ~0ULL.

  MRR performance numbers with 1 core, 1 ESP Tunnel, null-encrypt,
  64B for different SPD policy matching indices:

  SPD Policy index    : 1          10         100        1000
  Throughput          : MPPS/MPPS  MPPS/MPPS  MPPS/MPPS  KPPS/MPPS
  (Baseline/Optimized)
  ARM Neoverse N1     : 5.2/4.84   4.55/4.84  2.11/4.84  329.5/4.84
  ARM TX2             : 2.81/2.6   2.51/2.6   1.27/2.6   176.62/2.6
  INTEL SKX           : 4.93/4.48  4.29/4.46  2.05/4.48  336.79/4.47

  Next Steps:
  Following can be made as a configurable option through startup
  conf at IPSec level:
  1. Enable/Disable Flow cache.
  2. Bihash configuration like number of buckets and memory size.
  3. Dual/Quad loop unroll can be applied around bihash to further
     improve the performance.
  4. The same flow cache logic can be applied for IPv6 as well as in
     IPSec inbound direction. A deeper and wider flow cache using
     bihash_40_8 can replace existing bihash_16_8, to make it
     common for both IPv4 and IPv6 in both outbound and
     inbound directions.

Following changes are made based on the review comments:
1. ON/OFF flow cache through startup conf. Default: OFF
2. Flow cache stale entry detection using epoch counter.
3. Avoid host order endianness conversion during flow cache
   lookup.
4. Move IPSec startup conf to a common file.
5. Added SPD flow cache unit test case
6. Replaced bihash with vectors to implement flow cache.
7. ipsec_add_del_policy API is not mpsafe. Cleaned up
   inflight packets check in control plane.

Type: improvement
Signed-off-by: mgovind <govindarajan.Mohandoss@arm.com>
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Tested-by: Jieqiang Wang <jieqiang.wang@arm.com>
Change-Id: I62b4d6625fbc6caf292427a5d2046aa5672b2006
10 files changed
tree: eb17ffe94db34644ccfb870732a8c6e3d6ba58b7
  1. build/
  2. build-data/
  3. build-root/
  4. docs/
  5. doxygen/
  6. extras/
  7. src/
  8. test/
  9. .clang-format
  10. .clang-tidy
  11. .git_commit_template.txt
  12. .gitignore
  13. .gitreview
  14. configure
  15. INFO.yaml
  16. LICENSE
  17. MAINTAINERS
  18. Makefile
  19. README.md
  20. RELEASE.md
README.md

Vector Packet Processing

Introduction

The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality. It is the open source version of Cisco's Vector Packet Processing (VPP) technology: a high performance, packet-processing stack that can run on commodity CPUs.

The benefits of this implementation of VPP are its high performance, proven technology, its modularity and flexibility, and rich feature set.

For more information on VPP and its features please visit the FD.io website and What is VPP? pages.

Changes

Details of the changes leading up to this version of VPP can be found under @ref release_notes.

Directory layout

Directory nameDescription
build-dataBuild metadata
build-rootBuild output directory
doxygenDocumentation generator configuration
dpdkDPDK patches and build infrastructure
@ref extras/libmemifClient library for memif
@ref src/examplesVPP example code
@ref src/pluginsVPP bundled plugins directory
@ref src/svmShared virtual memory allocation library
src/testsStandalone tests (not part of test harness)
src/vatVPP API test program
@ref src/vlibVPP application library
@ref src/vlibapiVPP API library
@ref src/vlibmemoryVPP Memory management
@ref src/vnetVPP networking
@ref src/vppVPP application
@ref src/vpp-apiVPP application API bindings
@ref src/vppinfraVPP core library
@ref src/vpp/apiNot-yet-relocated API bindings
testUnit tests and Python test harness

Getting started

In general anyone interested in building, developing or running VPP should consult the VPP wiki for more complete documentation.

In particular, readers are recommended to take a look at [Pulling, Building, Running, Hacking, Pushing](https://wiki.fd.io/view/VPP/Pulling,_Building,_Run ning,_Hacking_and_Pushing_VPP_Code) which provides extensive step-by-step coverage of the topic.

For the impatient, some salient information is distilled below.

Quick-start: On an existing Linux host

To install system dependencies, build VPP and then install it, simply run the build script. This should be performed a non-privileged user with sudo access from the project base directory:

./extras/vagrant/build.sh

If you want a more fine-grained approach because you intend to do some development work, the Makefile in the root directory of the source tree provides several convenience shortcuts as make targets that may be of interest. To see the available targets run:

make

Quick-start: Vagrant

The directory extras/vagrant contains a VagrantFile and supporting scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. This VM can then be used to test concepts with VPP or as a development platform to extend VPP. Some obvious caveats apply when using a VM for VPP since its performance will never match that of bare metal; if your work is timing or performance sensitive, consider using bare metal in addition or instead of the VM.

For this to work you will need a working installation of Vagrant. Instructions for this can be found [on the Setting up Vagrant wiki page] (https://wiki.fd.io/view/DEV/Setting_Up_Vagrant).

More information

Several modules provide documentation, see @subpage user_doc for more end-user-oriented information. Also see @subpage dev_doc for developer notes.

Visit the VPP wiki for details on more advanced building strategies and other development notes.

Test Framework

There is PyDoc generated documentation available for the VPP test framework. See @ref test_framework_doc for details.