Blame - docs/overview/whatisvpp/what-is-vector-packet-processing.rst - fdio/vpp

blob: 994318e81c5ba5aa1e012abb1363345ded9597b8 [file] [log] [blame]

John DeNisco	06dcd45	2018-07-26 12:45:10 -0400	[diff] [blame]	1	:orphan:
				2
				3	.. _what-is-vector-packet-processing:
				4
				5	=================================
				6	What is vector packet processing?
				7	=================================
				8
				9	FD.io VPP is developed using vector packet processing concepts, as opposed to
				10	scalar packet processing, these concepts are explained in the following sections.
				11
				12	Vector packet processing is a common approach among high performance `Userspace
				13	<https://en.wikipedia.org/wiki/User_space>`_ packet processing applications such
				14	as developed with FD.io VPP and `DPDK
				15	<https://en.wikipedia.org/wiki/Data_Plane_Development_Kit>`_. The scalar based
				16	aproach tends to be favoured by Operating System `Kernel
				17	<https://en.wikipedia.org/wiki/Kernel_(operating_system)>`_ Network Stacks and
				18	Userspace stacks that don't have strict performance requirements.
				19
				20	Scalar Packet Processing
				21
				22	A scalar packet processing network stack typically processes one packet at a
				23	time: an interrupt handling function takes a single packet from a Network
				24	Inteface, and processes it through a set of functions: fooA calls fooB calls
				25	fooC and so on.
				26
				27	.. code-block:: none
				28
				29	+---> fooA(packet1) +---> fooB(packet1) +---> fooC(packet1)
				30	+---> fooA(packet2) +---> fooB(packet2) +---> fooC(packet2)
				31	...
				32	+---> fooA(packet3) +---> fooB(packet3) +---> fooC(packet3)
				33
				34
				35	Scalar packet processing is simple, but inefficent in these ways:
				36
				37	* When the code path length exceeds the size of the Microprocessor's instruction
				38	cache (I-cache), `thrashing
				39	<https://en.wikipedia.org/wiki/Thrashing_(computer_science)>`_ occurs as the
				40	Microprocessor is continually loading new instructions. In this model, each
				41	packet incurs an identical set of I-cache misses.
				42	* The associated deep call stack will also add load-store-unit pressure as
				43	stack-locals fall out of the Microprocessor's Layer 1 Data Cache (D-cache).
				44
				45	Vector Packet Processing
				46
				47	In contrast, a vector packet processing network stack processes multiple packets
				48	at a time, called 'vectors of packets' or simply a 'vector'. An interrupt
				49	handling function takes the vector of packets from a Network Inteface, and
				50	processes the vector through a set of functions: fooA calls fooB calls fooC and
				51	so on.
				52
				53	.. code-block:: none
				54
				55	+---> fooA([packet1, +---> fooB([packet1, +---> fooC([packet1, +--->
				56	packet2, packet2, packet2,
				57	... ... ...
				58	packet256]) packet256]) packet256])
				59
				60	This approach fixes:
				61
				62	* The I-cache thrashing problem described above, by ammoritizing the cost of
				63	I-cache loads across multiple packets.
				64
				65	* The ineffeciences associated with the deep call stack by recieving vectors
				66	of up to 256 packets at a time from the Network Interface, and processes them
				67	using a directed graph of node. The graph scheduler invokes one node dispatch
				68	function at a time, restricting stack depth to a few stack frames.
				69
				70	The further optimizations that this approaches enables are pipelining and
				71	prefetching to minimize read latency on table data and parallelize packet loads
				72	needed to process packets.
				73