blob: 994318e81c5ba5aa1e012abb1363345ded9597b8 [file] [log] [blame]
John DeNisco06dcd452018-07-26 12:45:10 -04001:orphan:
2
3.. _what-is-vector-packet-processing:
4
5=================================
6What is vector packet processing?
7=================================
8
9FD.io VPP is developed using vector packet processing concepts, as opposed to
10scalar packet processing, these concepts are explained in the following sections.
11
12Vector packet processing is a common approach among high performance `Userspace
13<https://en.wikipedia.org/wiki/User_space>`_ packet processing applications such
14as developed with FD.io VPP and `DPDK
15<https://en.wikipedia.org/wiki/Data_Plane_Development_Kit>`_. The scalar based
16aproach tends to be favoured by Operating System `Kernel
17<https://en.wikipedia.org/wiki/Kernel_(operating_system)>`_ Network Stacks and
18Userspace stacks that don't have strict performance requirements.
19
20**Scalar Packet Processing**
21
22A scalar packet processing network stack typically processes one packet at a
23time: an interrupt handling function takes a single packet from a Network
24Inteface, and processes it through a set of functions: fooA calls fooB calls
25fooC and so on.
26
27.. code-block:: none
28
29 +---> fooA(packet1) +---> fooB(packet1) +---> fooC(packet1)
30 +---> fooA(packet2) +---> fooB(packet2) +---> fooC(packet2)
31 ...
32 +---> fooA(packet3) +---> fooB(packet3) +---> fooC(packet3)
33
34
35Scalar packet processing is simple, but inefficent in these ways:
36
37* When the code path length exceeds the size of the Microprocessor's instruction
38 cache (I-cache), `thrashing
39 <https://en.wikipedia.org/wiki/Thrashing_(computer_science)>`_ occurs as the
40 Microprocessor is continually loading new instructions. In this model, each
41 packet incurs an identical set of I-cache misses.
42* The associated deep call stack will also add load-store-unit pressure as
43 stack-locals fall out of the Microprocessor's Layer 1 Data Cache (D-cache).
44
45**Vector Packet Processing**
46
47In contrast, a vector packet processing network stack processes multiple packets
48at a time, called 'vectors of packets' or simply a 'vector'. An interrupt
49handling function takes the vector of packets from a Network Inteface, and
50processes the vector through a set of functions: fooA calls fooB calls fooC and
51so on.
52
53.. code-block:: none
54
55 +---> fooA([packet1, +---> fooB([packet1, +---> fooC([packet1, +--->
56 packet2, packet2, packet2,
57 ... ... ...
58 packet256]) packet256]) packet256])
59
60This approach fixes:
61
62* The I-cache thrashing problem described above, by ammoritizing the cost of
63 I-cache loads across multiple packets.
64
65* The ineffeciences associated with the deep call stack by recieving vectors
66 of up to 256 packets at a time from the Network Interface, and processes them
67 using a directed graph of node. The graph scheduler invokes one node dispatch
68 function at a time, restricting stack depth to a few stack frames.
69
70The further optimizations that this approaches enables are pipelining and
71prefetching to minimize read latency on table data and parallelize packet loads
72needed to process packets.
73