Blame - docs/developer/corearchitecture/vnet.rst - fdio/vpp

blob: 2740ac0ad03deedb512e86fddda3037967d94458 [file] [log] [blame]

Nathan Skrzypczak	9ad39c0	2021-08-19 11:38:06 +0200	[diff] [blame]	1	VNET (VPP Network Stack)
				2	========================
				3
				4	The files associated with the VPP network stack layer are located in the
				5	./src/vnet folder. The Network Stack Layer is basically an
				6	instantiation of the code in the other layers. This layer has a vnet
				7	library that provides vectorized layer-2 and 3 networking graph nodes, a
				8	packet generator, and a packet tracer.
				9
				10	In terms of building a packet processing application, vnet provides a
				11	platform-independent subgraph to which one connects a couple of
				12	device-driver nodes.
				13
				14	Typical RX connections include “ethernet-input” [full software
				15	classification, feeds ipv4-input, ipv6-input, arp-input etc.] and
				16	“ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header
				17	checksum].
				18
				19	Effective graph dispatch function coding
				20	----------------------------------------
				21
				22	Over the 15 years, multiple coding styles have emerged: a
				23	single/dual/quad loop coding model (with variations) and a
				24	fully-pipelined coding model.
				25
				26	Single/dual loops
				27	-----------------
				28
				29	The single/dual/quad loop model variations conveniently solve problems
				30	where the number of items to process is not known in advance: typical
				31	hardware RX-ring processing. This coding style is also very effective
				32	when a given node will not need to cover a complex set of dependent
				33	reads.
				34
				35	Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
				36	units to convert buffer indices to buffer pointers:
				37
				38	.. code:: c
				39
				40	static uword
				41	simulated_ethernet_interface_tx (vlib_main_t * vm,
				42	vlib_node_runtime_t *
				43	node, vlib_frame_t * frame)
				44	{
				45	u32 n_left_from, *from;
				46	u32 next_index = 0;
				47	u32 n_bytes;
				48	u32 thread_index = vm->thread_index;
				49	vnet_main_t *vnm = vnet_get_main ();
				50	vnet_interface_main_t *im = &vnm->interface_main;
				51	vlib_buffer_t bufs[VLIB_FRAME_SIZE], *b;
				52	u16 nexts[VLIB_FRAME_SIZE], *next;
				53
				54	n_left_from = frame->n_vectors;
				55	from = vlib_frame_vector_args (frame);
				56
				57	/*
				58	* Convert up to VLIB_FRAME_SIZE indices in "from" to
				59	* buffer pointers in bufs[]
				60	*/
				61	vlib_get_buffers (vm, from, bufs, n_left_from);
				62	b = bufs;
				63	next = nexts;
				64
				65	/*
				66	* While we have at least 4 vector elements (pkts) to process..
				67	*/
				68	while (n_left_from >= 4)
				69	{
				70	/* Prefetch next quad-loop iteration. */
				71	if (PREDICT_TRUE (n_left_from >= 8))
				72	{
				73	vlib_prefetch_buffer_header (b[4], STORE);
				74	vlib_prefetch_buffer_header (b[5], STORE);
				75	vlib_prefetch_buffer_header (b[6], STORE);
				76	vlib_prefetch_buffer_header (b[7], STORE);
				77	}
				78
				79	/*
				80	* $$$ Process 4x packets right here...
				81	* set next[0..3] to send the packets where they need to go
				82	*/
				83
				84	do_something_to (b[0]);
				85	do_something_to (b[1]);
				86	do_something_to (b[2]);
				87	do_something_to (b[3]);
				88
				89	/* Process the next 0..4 packets */
				90	b += 4;
				91	next += 4;
				92	n_left_from -= 4;
				93	}
				94	/*
				95	* Clean up 0...3 remaining packets at the end of the incoming frame
				96	*/
				97	while (n_left_from > 0)
				98	{
				99	/*
				100	* $$$ Process one packet right here...
				101	* set next[0..3] to send the packets where they need to go
				102	*/
				103	do_something_to (b[0]);
				104
				105	/* Process the next packet */
				106	b += 1;
				107	next += 1;
				108	n_left_from -= 1;
				109	}
				110
				111	/*
				112	* Send the packets along their respective next-node graph arcs
				113	* Considerable locality of reference is expected, most if not all
				114	* packets in the inbound vector will traverse the same next-node
				115	* arc
				116	*/
				117	vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
				118
				119	return frame->n_vectors;
				120	}
				121
				122	Given a packet processing task to implement, it pays to scout around
				123	looking for similar tasks, and think about using the same coding
				124	pattern. It is not uncommon to recode a given graph node dispatch
				125	function several times during performance optimization.
				126
				127	Creating Packets from Scratch
				128	-----------------------------
				129
				130	At times, it’s necessary to create packets from scratch and send them.
				131	Tasks like sending keepalives or actively opening connections come to
				132	mind. Its not difficult, but accurate buffer metadata setup is required.
				133
				134	Allocating Buffers
				135	~~~~~~~~~~~~~~~~~~
				136
				137	Use vlib_buffer_alloc, which allocates a set of buffer indices. For
				138	low-performance applications, it’s OK to allocate one buffer at a time.
				139	Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See
				140	below.
				141
				142	In high-performance cases, allocate a vector of buffer indices, and hand
				143	them out from the end of the vector; decrement \_vec_len(..) as buffer
				144	indices are allocated. See tcp_alloc_tx_buffers(…) and
				145	tcp_get_free_buffer_index(…) for an example.
				146
				147	Buffer Initialization Example
				148	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				149
				150	The following example shows the main points, but is not to be
				151	blindly cut-’n-pasted.
				152
				153	.. code:: c
				154
				155	u32 bi0;
				156	vlib_buffer_t *b0;
				157	ip4_header_t *ip;
				158	udp_header_t *udp;
				159
				160	/* Allocate a buffer */
				161	if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
				162	return -1;
				163
				164	b0 = vlib_get_buffer (vm, bi0);
				165
				166	/* At this point b0->current_data = 0, b0->current_length = 0 */
				167
				168	/*
				169	* Copy data into the buffer. This example ASSUMES that data will fit
				170	* in a single buffer, and is e.g. an ip4 packet.
				171	*/
				172	if (have_packet_rewrite)
				173	{
				174	clib_memcpy (b0->data, data, vec_len (data));
				175	b0->current_length = vec_len (data);
				176	}
				177	else
				178	{
				179	/* OR, build a udp-ip packet (for example) */
				180	ip = vlib_buffer_get_current (b0);
				181	udp = (udp_header_t *) (ip + 1);
				182	data_dst = (u8 *) (udp + 1);
				183
				184	ip->ip_version_and_header_length = 0x45;
				185	ip->ttl = 254;
				186	ip->protocol = IP_PROTOCOL_UDP;
				187	ip->length = clib_host_to_net_u16 (sizeof (ip) + sizeof (udp) +
				188	vec_len(udp_data));
				189	ip->src_address.as_u32 = src_address->as_u32;
				190	ip->dst_address.as_u32 = dst_address->as_u32;
				191	udp->src_port = clib_host_to_net_u16 (src_port);
				192	udp->dst_port = clib_host_to_net_u16 (dst_port);
				193	udp->length = clib_host_to_net_u16 (vec_len (udp_data));
				194	clib_memcpy (data_dst, udp_data, vec_len(udp_data));
				195
				196	if (compute_udp_checksum)
				197	{
				198	/* RFC 7011 section 10.3.2. */
				199	udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
				200	if (udp->checksum == 0)
				201	udp->checksum = 0xffff;
				202	}
				203	b0->current_length = vec_len (sizeof (ip) + sizeof (udp) +
				204	vec_len (udp_data));
				205
				206	}
				207	b0->flags \|= VLIB_BUFFER_TOTAL_LENGTH_VALID;
				208
				209	/* sw_if_index 0 is the "local" interface, which always exists */
				210	vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
				211
				212	/* Use the default FIB index for tx lookup. Set non-zero to use another fib */
				213	vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
				214
				215	If your use-case calls for large packet transmission, use
				216	vlib_buffer_chain_append_data_with_alloc(…) to create the requisite
				217	buffer chain.
				218
				219	Enqueueing packets for lookup and transmission
				220	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				221
				222	The simplest way to send a set of packets is to use
				223	vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node
				224	or ip6_lookup_node, add the constructed buffer indices, and dispatch the
				225	frame using vlib_put_frame_to_node(…).
				226
				227	.. code:: c
				228
				229	vlib_frame_t *f;
				230	f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
				231	f->n_vectors = vec_len(buffer_indices_to_send);
				232	to_next = vlib_frame_vector_args (f);
				233
				234	for (i = 0; i < vec_len (buffer_indices_to_send); i++)
				235	to_next[i] = buffer_indices_to_send[i];
				236
				237	vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
				238
				239	It is inefficient to allocate and schedule single packet frames. That’s
				240	typical in case you need to send one packet per second, but should
				241	not occur in a for-loop!
				242
				243	Packet tracer
				244	-------------
				245
				246	Vlib includes a frame element [packet] trace facility, with a simple
				247	debug CLI interface. The cli is straightforward: “trace add
				248	input-node-name count” to start capturing packet traces.
				249
				250	To trace 100 packets on a typical x86_64 system running the dpdk plugin:
				251	“trace add dpdk-input 100”. When using the packet generator: “trace add
				252	pg-input 100”
				253
				254	To display the packet trace: “show trace”
				255
				256	Each graph node has the opportunity to capture its own trace data. It is
				257	almost always a good idea to do so. The trace capture APIs are simple.
				258
				259	The packet capture APIs snapshoot binary data, to minimize processing at
				260	capture time. Each participating graph node initialization provides a
				261	vppinfra format-style user function to pretty-print data when required
				262	by the VLIB “show trace” command.
				263
				264	Set the VLIB node registration “.format_trace” member to the name of the
				265	per-graph node format function.
				266
				267	Here’s a simple example:
				268
				269	.. code:: c
				270
				271	u8 * my_node_format_trace (u8 * s, va_list * args)
				272	{
				273	vlib_main_t * vm = va_arg (args, vlib_main_t );
				274	vlib_node_t * node = va_arg (args, vlib_node_t );
				275	my_node_trace_t * t = va_arg (args, my_trace_t );
				276
				277	s = format (s, "My trace data was: %d", t-><whatever>);
				278
				279	return s;
				280	}
				281
				282	The trace framework hands the per-node format function the data it
				283	captured as the packet whizzed by. The format function pretty-prints the
				284	data as desired.
				285
				286	Graph Dispatcher Pcap Tracing
				287	-----------------------------
				288
				289	The vpp graph dispatcher knows how to capture vectors of packets in pcap
				290	format as they’re dispatched. The pcap captures are as follows:
				291
				292	::
				293
				294	VPP graph dispatch trace record description:
				295
				296	0 1 2 3
				297	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
				298	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				299	\| Major Version \| Minor Version \| NStrings \| ProtoHint \|
				300	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				301	\| Buffer index (big endian) \|
				302	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				303	+ VPP graph node name ... ... \| NULL octet \|
				304	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				305	\| Buffer Metadata ... ... \| NULL octet \|
				306	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				307	\| Buffer Opaque ... ... \| NULL octet \|
				308	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				309	\| Buffer Opaque 2 ... ... \| NULL octet \|
				310	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				311	\| VPP ASCII packet trace (if NStrings > 4) \| NULL octet \|
				312	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				313	\| Packet data (up to 16K) \|
				314	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
				315
				316	Graph dispatch records comprise a version stamp, an indication of how
				317	many NULL-terminated strings will follow the record header and preceed
				318	packet data, and a protocol hint.
				319
				320	The buffer index is an opaque 32-bit cookie which allows consumers of
				321	these data to easily filter/track single packets as they traverse the
				322	forwarding graph.
				323
				324	Multiple records per packet are normal, and to be expected. Packets will
				325	appear multiple times as they traverse the vpp forwarding graph. In this
				326	way, vpp graph dispatch traces are significantly different from regular
				327	network packet captures from an end-station. This property complicates
				328	stateful packet analysis.
				329
				330	Restricting stateful analysis to records from a single vpp graph node
				331	such as “ethernet-input” seems likely to improve the situation.
				332
				333	As of this writing: major version = 1, minor version = 0. Nstrings
				334	SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater
				335	than 5. They MAY attempt to display the claimed number of strings, or
				336	they MAY treat the condition as an error.
				337
				338	Here is the current set of protocol hints:
				339
				340	.. code:: c
				341
				342	typedef enum
				343	{
				344	VLIB_NODE_PROTO_HINT_NONE = 0,
				345	VLIB_NODE_PROTO_HINT_ETHERNET,
				346	VLIB_NODE_PROTO_HINT_IP4,
				347	VLIB_NODE_PROTO_HINT_IP6,
				348	VLIB_NODE_PROTO_HINT_TCP,
				349	VLIB_NODE_PROTO_HINT_UDP,
				350	VLIB_NODE_N_PROTO_HINTS,
				351	} vlib_node_proto_hint_t;
				352
				353	Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
				354	data SHOULD be 0x60, and should begin an ipv6 packet header.
				355
				356	Downstream consumers of these data SHOULD pay attention to the protocol
				357	hint. They MUST tolerate inaccurate hints, which MAY occur from time to
				358	time.
				359
				360	Dispatch Pcap Trace Debug CLI
				361	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				362
				363	To start a dispatch trace capture of up to 10,000 trace records:
				364
				365	::
				366
				367	pcap dispatch trace on max 10000 file dispatch.pcap
				368
				369	To start a dispatch trace which will also include standard vpp packet
				370	tracing for packets which originate in dpdk-input:
				371
				372	::
				373
				374	pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
				375
				376	To save the pcap trace, e.g. in /tmp/dispatch.pcap:
				377
				378	::
				379
				380	pcap dispatch trace off
				381
				382	Wireshark dissection of dispatch pcap traces
				383	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				384
				385	It almost goes without saying that we built a companion wireshark
				386	dissector to display these traces. As of this writing, we have
				387	upstreamed the wireshark dissector.
				388
				389	Since it will be a while before wireshark/master/latest makes it into
				390	all of the popular Linux distros, please see the “How to build a vpp
				391	dispatch trace aware Wireshark” page for build info.
				392
				393	Here is a sample packet dissection, with some fields omitted for
				394	clarity. The point is that the wireshark dissector accurately displays
				395	all of the vpp buffer metadata, and the name of the graph node in
				396	question.
				397
				398	::
				399
				400	Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
				401	Encapsulation type: USER 13 (58)
				402	[Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
				403	VPP Dispatch Trace
				404	BufferIndex: 0x00036663
				405	NodeName: ethernet-input
				406	VPP Buffer Metadata
				407	Metadata: flags:
				408	Metadata: current_data: 0, current_length: 102
				409	Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
				410	Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
				411	Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
				412	Metadata: free_list_index: 0
				413	Metadata:
				414	VPP Buffer Opaque
				415	Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
				416	Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
				417	Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
				418	Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
				419	Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
				420	Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
				421	Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
				422	Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
				423	Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
				424	Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
				425	Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
				426	Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
				427	Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
				428	Opaque: l2.feature_bitmap_input: none configured, L2.feature_bitmap_output: none configured
				429	Opaque: l2t.next_index: 0, l2t.session_index: 0
				430	Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
				431	Opaque: policer.index: 0
				432	Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
				433	Opaque: map.mtu: 0
				434	Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
				435	Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
				436	Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
				437	Opaque: cop.current_config_index: 0
				438	Opaque: lisp.overlay_afi: 0
				439	Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
				440	Opaque: tcp.data_len: 0, tcp.flags: 0x0
				441	Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
				442	Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
				443	Opaque: snat.flags: 0x0
				444	Opaque:
				445	VPP Buffer Opaque2
				446	Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
				447	Opaque2: qos.bits: 0, qos.source: 0
				448	Opaque2: loop_counter: 0
				449	Opaque2: gbp.flags: 0, gbp.src_epg: 0
				450	Opaque2: pg_replay_timestamp: 0
				451	Opaque2:
				452	Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6 Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
				453	Source Port: 22432
				454	Destination Port: 54084
				455	TCP payload (36 bytes)
				456	Data (36 bytes)
				457
				458	0000 cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11 ....S...)u>Vc...
				459	0010 e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d ...'.VL!..#F....
				460	0020 a8 98 36 5a ..6Z
				461	Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
				462	[Length: 36]
				463
				464	It’s a matter of a couple of mouse-clicks in Wireshark to filter the
				465	trace to a specific buffer index. With that specific kind of filtration,
				466	one can watch a packet walk through the forwarding graph; noting any/all
				467	metadata changes, header checksum changes, and so forth.
				468
				469	This should be of significant value when developing new vpp graph nodes.
				470	If new code mispositions b->current_data, it will be completely obvious
				471	from looking at the dispatch trace in wireshark.
				472
				473	pcap rx, tx, and drop tracing
				474	-----------------------------
				475
				476	vpp also supports rx, tx, and drop packet capture in pcap format,
				477	through the “pcap trace” debug CLI command.
				478
				479	This command is used to start or stop a packet capture, or show the
				480	status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and
				481	“pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and
				482	“drop” to enable multiple simultaneous capture types.
				483
				484	These commands have the following optional parameters:
				485
				486	- rx - trace received packets.
				487
				488	- tx - trace transmitted packets.
				489
				490	- drop - trace dropped packets.
				491
				492	- max nnnn\ - file size, number of packet captures. Once packets
				493	have been received, the trace buffer buffer is flushed to the
				494	indicated file. Defaults to 1000. Can only be updated if packet
				495	capture is off.
				496
				497	- max-bytes-per-pkt nnnn\ - maximum number of bytes to trace on a
				498	per-packet basis. Must be >32 and less than 9000. Default value:
				499
				500	512.
				501
Nobuhiro MIKI	d346f39	2023-02-28 18:30:09 +0900	[diff] [blame^]	502	- filter - Use the pcap trace rx / tx / drop filter, which must be
Nathan Skrzypczak	9ad39c0	2021-08-19 11:38:06 +0200	[diff] [blame]	503	configured. Use classify filter pcap… to configure the filter. The
				504	filter will only be executed if the per-interface or any-interface
				505	tests fail.
				506
				507	- intfc interface \\| any\ - Used to specify a given interface, or
				508	use ‘any’ to run packet capture on all interfaces. ‘any’ is the
				509	default if not provided. Settings from a previous packet capture are
				510	preserved, so ‘any’ can be used to reset the interface setting.
				511
				512	- file filename\ - Used to specify the output filename. The file
				513	will be placed in the ‘/tmp’ directory. If filename already exists,
				514	file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap
				515	or tx.pcap’ will be used, depending on capture direction. Can only be
				516	updated when pcap capture is off.
				517
				518	- status - Displays the current status and configured attributes
				519	associated with a packet capture. If packet capture is in progress,
				520	‘status’ also will return the number of packets currently in the
				521	buffer. Any additional attributes entered on command line with a
				522	‘status’ request will be ignored.
				523
				524	- filter - Capture packets which match the current packet trace filter
				525	set. See next section. Configure the capture filter first.
				526
				527	packet trace capture filtering
				528	------------------------------
				529
				530	The “classify filter pcap \\| \\| trace” debug CLI command constructs an
Nobuhiro MIKI	d346f39	2023-02-28 18:30:09 +0900	[diff] [blame^]	531	arbitrary set of packet classifier tables for use with “pcap trace rx \\|
				532	tx \\| drop,” and with the vpp packet tracer on a per-interface or
Nathan Skrzypczak	9ad39c0	2021-08-19 11:38:06 +0200	[diff] [blame]	533	system-wide basis.
				534
				535	Packets which match a rule in the classifier table chain will be traced.
				536	The tables are automatically ordered so that matches in the most
				537	specific table are tried first.
				538
				539	It’s reasonably likely that folks will configure a single table with one
				540	or two matches. As a result, we configure 8 hash buckets and 128K of
				541	match rule space by default. One can override the defaults by specifying
				542	“buckets ” and “memory-size ” as desired.
				543
				544	To build up complex filter chains, repeatedly issue the classify filter
				545	debug CLI command. Each command must specify the desired mask and match
				546	values. If a classifier table with a suitable mask already exists, the
				547	CLI command adds a match rule to the existing table. If not, the CLI
				548	command add a new table and the indicated mask rule
				549
				550	Configure a simple pcap classify filter
				551	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				552
				553	::
				554
				555	classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
				556	pcap trace rx max 100 filter
				557
				558	Configure a simple per-interface capture filter
				559	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				560
				561	::
				562
				563	classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
				564	pcap trace rx max 100 intfc GigabitEthernet3/0/0
				565
				566	Note that per-interface capture filters are always applied.
				567
				568	Clear per-interface capture filters
				569	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				570
				571	::
				572
				573	classify filter GigabitEthernet3/0/0 del
				574
				575	Configure another fairly simple pcap classify filter
				576	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				577
				578	::
				579
				580	classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
				581	pcap trace tx max 100 filter
				582
				583	Configure a vpp packet tracer filter
				584	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				585
				586	::
				587
				588	classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
				589	trace add dpdk-input 100 filter
				590
				591	Clear all current classifier filters
				592	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				593
				594	::
				595
				596	classify filter [pcap \| <interface> \| trace] del
				597
				598	To inspect the classifier tables
				599	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				600
				601	::
				602
				603	show classify table [verbose]
				604
				605	The verbose form displays all of the match rules, with hit-counters.
				606
				607	Terse description of the “mask ” syntax:
				608	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				609
				610	::
				611
				612	l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
				613	l3 ip4 <ip4-mask> ip6 <ip6-mask>
				614	<ip4-mask> version hdr_length src[/width] dst[/width]
				615	tos length fragment_id ttl protocol checksum
				616	<ip6-mask> version traffic-class flow-label src dst proto
				617	payload_length hop_limit protocol
				618	l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
				619	<tcp-mask> src dst # ports
				620	<udp-mask> src_port dst_port
				621
				622	To construct matches, add the values to match after the indicated
				623	keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “…
				624	match l3 ip4 src 192.168.1.11”
				625
				626	VPP Packet Generator
				627	--------------------
				628
				629	We use the VPP packet generator to inject packets into the forwarding
				630	graph. The packet generator can replay pcap traces, and generate packets
				631	out of whole cloth at respectably high performance.
				632
				633	The VPP pg enables quite a variety of use-cases, ranging from functional
				634	testing of new data-plane nodes to regression testing to performance
				635	tuning.
				636
				637	PG setup scripts
				638	----------------
				639
				640	PG setup scripts describe traffic in detail, and leverage vpp debug CLI
				641	mechanisms. It’s reasonably unusual to construct a pg setup script which
				642	doesn’t include a certain amount of interface and FIB configuration.
				643
				644	For example:
				645
				646	::
				647
				648	loop create
				649	set int ip address loop0 192.168.1.1/24
				650	set int state loop0 up
				651
				652	packet-generator new {
				653	name pg0
				654	limit 100
				655	rate 1e6
				656	size 300-300
				657	interface loop0
				658	node ethernet-input
				659	data { IP4: 1.2.3 -> 4.5.6
				660	UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
				661	UDP: 1234 -> 2345
				662	incrementing 286
				663	}
				664	}
				665
				666	A packet generator stream definition includes two major sections: -
				667	Stream Parameter Setup - Packet Data
				668
				669	Stream Parameter Setup
				670	~~~~~~~~~~~~~~~~~~~~~~
				671
				672	Given the example above, let’s look at how to set up stream parameters:
				673
				674	- name pg0 - Name of the stream, in this case “pg0”
				675
				676	- limit 1000 - Number of packets to send when the stream is
				677	enabled. “limit 0” means send packets continuously.
				678
				679	- maxframe <nnn> - Maximum frame size. Handy for injecting multiple
				680	frames no larger than <nnn>. Useful for checking dual / quad loop
				681	codes
				682
				683	- rate 1e6 - Packet injection rate, in this case 1 MPPS. When not
				684	specified, the packet generator injects packets as fast as possible
				685
				686	- size 300-300 - Packet size range, in this case send 300-byte
				687	packets
				688
				689	- interface loop0 - Packets appear as if they were received on the
				690	specified interface. This datum is used in multiple ways: to select
				691	graph arc feature configuration, to select IP FIBs. Configure
				692	features e.g. on loop0 to exercise those features.
				693
				694	- tx-interface <name> - Packets will be transmitted on the
				695	indicated interface. Typically required only when injecting packets
				696	into post-IP-rewrite graph nodes.
				697
				698	- pcap <filename> - Replay packets from the indicated pcap capture
				699	file. “make test” makes extensive use of this feature: generate
				700	packets using scapy, save them in a .pcap file, then inject them into
				701	the vpp graph via a vpp pg “pcap <filename>” stream definition
				702
				703	- worker <nn> - Generate packets for the stream using the indicated
				704	vpp worker thread. The vpp pg generates and injects O(10 MPPS /
				705	core). Use multiple stream definitions and worker threads to generate
				706	and inject enough traffic to easily fill a 40 gbit pipe with small
				707	packets.
				708
				709	Data definition
				710	~~~~~~~~~~~~~~~
				711
				712	Packet generator data definitions make use of a layered implementation
				713	strategy. Networking layers are specified in order, and the notation can
				714	seem a bit counter-intuitive. In the example above, the data definition
				715	stanza constructs a set of L2-L4 headers layers, and uses an
				716	incrementing fill pattern to round out the requested 300-byte packets.
				717
				718	- IP4: 1.2.3 -> 4.5.6 - Construct an L2 (MAC) header with the ip4
				719	ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
				720	address of 00:04:00:05:00:06. Mac addresses may be specified in
				721	either xxxx.xxxx.xxxx format or xx:xx:xx:xx:xx:xx format.
				722
				723	- UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10 - Construct an
				724	incrementing set of L3 (IPv4) headers for successive packets with
				725	source addresses ranging from .10 to .254. All packets in the stream
				726	have a constant dest address of 192.168.2.10. Set the protocol field
				727	to 17, UDP.
				728
				729	- UDP: 1234 -> 2345 - Set the UDP source and destination ports to
				730	1234 and 2345, respectively
				731
				732	- incrementing 256 - Insert up to 256 incrementing data bytes.
				733
				734	Obvious variations involve “s/IP4/IP6/” in the above, along with
				735	changing from IPv4 to IPv6 address notation.
				736
				737	The vpp pg can set any / all IPv4 header fields, including tos, packet
				738	length, mf / df / fragment id and offset, ttl, protocol, checksum, and
				739	src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
				740	details.
				741
				742	If all else fails, specify the entire packet data in hex:
				743
				744	- hex 0xabcd… - copy hex data verbatim into the packet
				745
				746	When replaying pcap files (“pcap <filename>”), do not specify a data
				747	stanza.
				748
				749	Diagnosing “packet-generator new” parse failures
				750	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				751
				752	If you want to inject packets into a brand-new graph node, remember to
				753	tell the packet generator debug CLI how to parse the packet data stanza.
				754
				755	If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer =
				756	unformat_ethernet_header”:
				757
				758	.. code:: c
				759
				760	VLIB_REGISTER_NODE (ethernet_input_node) =
				761	{
				762	<snip>
				763	.unformat_buffer = unformat_ethernet_header,
				764	<snip>
				765	};
				766
				767	Beyond that, it may be necessary to set breakpoints in
				768	…/src/vnet/pg/cli.c. Debug image suggested.
				769
				770	When debugging new nodes, it may be far simpler to directly inject
				771	ethernet frames - and add a corresponding vlib_buffer_advance in the new
				772	node - than to modify the packet generator.
				773
				774	Debug CLI
				775	---------
				776
				777	The descriptions above describe the “packet-generator new” debug CLI in
				778	detail.
				779
				780	Additional debug CLI commands include:
				781
				782	::
				783
				784	vpp# packet-generator enable [<stream-name>]
				785
				786	which enables the named stream, or all streams.
				787
				788	::
				789
				790	vpp# packet-generator disable [<stream-name>]
				791
				792	disables the named stream, or all streams.
				793
				794	::
				795
				796	vpp# packet-generator delete <stream-name>
				797
				798	Deletes the named stream.
				799
				800	::
				801
				802	vpp# packet-generator configure <stream-name> [limit <nnn>]
				803	[rate <f64-pps>] [size <nn>-<nn>]
				804
				805	Changes stream parameters without having to recreate the entire stream
				806	definition. Note that re-issuing a “packet-generator new” command will
				807	correctly recreate the named stream.