blob: 173efecebc51bfcfbd9fe15328e89be2560a1303 [file] [log] [blame]
John DeNiscoc96d6182019-11-06 10:58:28 -08001.. _scalar_vector:
John DeNisco06dcd452018-07-26 12:45:10 -04002
John DeNiscoc96d6182019-11-06 10:58:28 -08003==================================
4Scalar vs Vector packet processing
5==================================
John DeNisco06dcd452018-07-26 12:45:10 -04006
John DeNiscoc96d6182019-11-06 10:58:28 -08007FD.io VPP is developed using vector packet processing, as opposed to
8scalar packet processing.
John DeNisco06dcd452018-07-26 12:45:10 -04009
John DeNiscoc96d6182019-11-06 10:58:28 -080010Vector packet processing is a common approach among high performance packet
11processing applications such FD.io VPP and `DPDK <https://en.wikipedia.org/wiki/Data_Plane_Development_Kit>`_.
12The scalar based approach tends to be favoured by network stacks that
13don't necessarily have strict performance requirements.
John DeNisco06dcd452018-07-26 12:45:10 -040014
15**Scalar Packet Processing**
16
17A scalar packet processing network stack typically processes one packet at a
18time: an interrupt handling function takes a single packet from a Network
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -040019Interface, and processes it through a set of functions: fooA calls fooB calls
John DeNisco06dcd452018-07-26 12:45:10 -040020fooC and so on.
21
Nathan Skrzypczak9ad39c02021-08-19 11:38:06 +020022.. code-block:: none
John DeNisco06dcd452018-07-26 12:45:10 -040023
24 +---> fooA(packet1) +---> fooB(packet1) +---> fooC(packet1)
25 +---> fooA(packet2) +---> fooB(packet2) +---> fooC(packet2)
26 ...
27 +---> fooA(packet3) +---> fooB(packet3) +---> fooC(packet3)
28
29
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -040030Scalar packet processing is simple, but inefficient in these ways:
John DeNisco06dcd452018-07-26 12:45:10 -040031
32* When the code path length exceeds the size of the Microprocessor's instruction
33 cache (I-cache), `thrashing
34 <https://en.wikipedia.org/wiki/Thrashing_(computer_science)>`_ occurs as the
35 Microprocessor is continually loading new instructions. In this model, each
36 packet incurs an identical set of I-cache misses.
37* The associated deep call stack will also add load-store-unit pressure as
38 stack-locals fall out of the Microprocessor's Layer 1 Data Cache (D-cache).
39
40**Vector Packet Processing**
41
42In contrast, a vector packet processing network stack processes multiple packets
43at a time, called 'vectors of packets' or simply a 'vector'. An interrupt
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -040044handling function takes the vector of packets from a Network Interface, and
John DeNisco06dcd452018-07-26 12:45:10 -040045processes the vector through a set of functions: fooA calls fooB calls fooC and
46so on.
47
Nathan Skrzypczak9ad39c02021-08-19 11:38:06 +020048.. code-block:: none
John DeNisco06dcd452018-07-26 12:45:10 -040049
50 +---> fooA([packet1, +---> fooB([packet1, +---> fooC([packet1, +--->
51 packet2, packet2, packet2,
52 ... ... ...
53 packet256]) packet256]) packet256])
54
Nathan Skrzypczak9ad39c02021-08-19 11:38:06 +020055This approach fixes:
John DeNisco06dcd452018-07-26 12:45:10 -040056
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -040057* The I-cache thrashing problem described above, by amortizing the cost of
John DeNisco06dcd452018-07-26 12:45:10 -040058 I-cache loads across multiple packets.
59
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -040060* The inefficiencies associated with the deep call stack by receiving vectors
John DeNisco06dcd452018-07-26 12:45:10 -040061 of up to 256 packets at a time from the Network Interface, and processes them
62 using a directed graph of node. The graph scheduler invokes one node dispatch
63 function at a time, restricting stack depth to a few stack frames.
64
65The further optimizations that this approaches enables are pipelining and
66prefetching to minimize read latency on table data and parallelize packet loads
67needed to process packets.
68
John DeNiscoc96d6182019-11-06 10:58:28 -080069Press next for more on Packet Processing Graphs.