blob: 812e2fb4f8a04103f10825c31c0a511d19467c45 [file] [log] [blame]
Nathan Skrzypczak9ad39c02021-08-19 11:38:06 +02001VNET (VPP Network Stack)
2========================
3
4The files associated with the VPP network stack layer are located in the
5*./src/vnet* folder. The Network Stack Layer is basically an
6instantiation of the code in the other layers. This layer has a vnet
7library that provides vectorized layer-2 and 3 networking graph nodes, a
8packet generator, and a packet tracer.
9
10In terms of building a packet processing application, vnet provides a
11platform-independent subgraph to which one connects a couple of
12device-driver nodes.
13
14Typical RX connections include “ethernet-input” [full software
15classification, feeds ipv4-input, ipv6-input, arp-input etc.] and
16“ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header
17checksum].
18
19Effective graph dispatch function coding
20----------------------------------------
21
22Over the 15 years, multiple coding styles have emerged: a
23single/dual/quad loop coding model (with variations) and a
24fully-pipelined coding model.
25
26Single/dual loops
27-----------------
28
29The single/dual/quad loop model variations conveniently solve problems
30where the number of items to process is not known in advance: typical
31hardware RX-ring processing. This coding style is also very effective
32when a given node will not need to cover a complex set of dependent
33reads.
34
35Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
36units to convert buffer indices to buffer pointers:
37
38.. code:: c
39
40 static uword
41 simulated_ethernet_interface_tx (vlib_main_t * vm,
42 vlib_node_runtime_t *
43 node, vlib_frame_t * frame)
44 {
45 u32 n_left_from, *from;
46 u32 next_index = 0;
47 u32 n_bytes;
48 u32 thread_index = vm->thread_index;
49 vnet_main_t *vnm = vnet_get_main ();
50 vnet_interface_main_t *im = &vnm->interface_main;
51 vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
52 u16 nexts[VLIB_FRAME_SIZE], *next;
53
54 n_left_from = frame->n_vectors;
55 from = vlib_frame_vector_args (frame);
56
57 /*
58 * Convert up to VLIB_FRAME_SIZE indices in "from" to
59 * buffer pointers in bufs[]
60 */
61 vlib_get_buffers (vm, from, bufs, n_left_from);
62 b = bufs;
63 next = nexts;
64
65 /*
66 * While we have at least 4 vector elements (pkts) to process..
67 */
68 while (n_left_from >= 4)
69 {
70 /* Prefetch next quad-loop iteration. */
71 if (PREDICT_TRUE (n_left_from >= 8))
72 {
73 vlib_prefetch_buffer_header (b[4], STORE);
74 vlib_prefetch_buffer_header (b[5], STORE);
75 vlib_prefetch_buffer_header (b[6], STORE);
76 vlib_prefetch_buffer_header (b[7], STORE);
77 }
78
79 /*
80 * $$$ Process 4x packets right here...
81 * set next[0..3] to send the packets where they need to go
82 */
83
84 do_something_to (b[0]);
85 do_something_to (b[1]);
86 do_something_to (b[2]);
87 do_something_to (b[3]);
88
89 /* Process the next 0..4 packets */
90 b += 4;
91 next += 4;
92 n_left_from -= 4;
93 }
94 /*
95 * Clean up 0...3 remaining packets at the end of the incoming frame
96 */
97 while (n_left_from > 0)
98 {
99 /*
100 * $$$ Process one packet right here...
101 * set next[0..3] to send the packets where they need to go
102 */
103 do_something_to (b[0]);
104
105 /* Process the next packet */
106 b += 1;
107 next += 1;
108 n_left_from -= 1;
109 }
110
111 /*
112 * Send the packets along their respective next-node graph arcs
113 * Considerable locality of reference is expected, most if not all
114 * packets in the inbound vector will traverse the same next-node
115 * arc
116 */
117 vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
118
119 return frame->n_vectors;
120 }
121
122Given a packet processing task to implement, it pays to scout around
123looking for similar tasks, and think about using the same coding
124pattern. It is not uncommon to recode a given graph node dispatch
125function several times during performance optimization.
126
127Creating Packets from Scratch
128-----------------------------
129
130At times, it’s necessary to create packets from scratch and send them.
131Tasks like sending keepalives or actively opening connections come to
132mind. Its not difficult, but accurate buffer metadata setup is required.
133
134Allocating Buffers
135~~~~~~~~~~~~~~~~~~
136
137Use vlib_buffer_alloc, which allocates a set of buffer indices. For
138low-performance applications, it’s OK to allocate one buffer at a time.
139Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See
140below.
141
142In high-performance cases, allocate a vector of buffer indices, and hand
143them out from the end of the vector; decrement \_vec_len(..) as buffer
144indices are allocated. See tcp_alloc_tx_buffers(…) and
145tcp_get_free_buffer_index(…) for an example.
146
147Buffer Initialization Example
148~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149
150The following example shows the **main points**, but is not to be
151blindly cut-’n-pasted.
152
153.. code:: c
154
155 u32 bi0;
156 vlib_buffer_t *b0;
157 ip4_header_t *ip;
158 udp_header_t *udp;
159
160 /* Allocate a buffer */
161 if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
162 return -1;
163
164 b0 = vlib_get_buffer (vm, bi0);
165
166 /* At this point b0->current_data = 0, b0->current_length = 0 */
167
168 /*
169 * Copy data into the buffer. This example ASSUMES that data will fit
170 * in a single buffer, and is e.g. an ip4 packet.
171 */
172 if (have_packet_rewrite)
173 {
174 clib_memcpy (b0->data, data, vec_len (data));
175 b0->current_length = vec_len (data);
176 }
177 else
178 {
179 /* OR, build a udp-ip packet (for example) */
180 ip = vlib_buffer_get_current (b0);
181 udp = (udp_header_t *) (ip + 1);
182 data_dst = (u8 *) (udp + 1);
183
184 ip->ip_version_and_header_length = 0x45;
185 ip->ttl = 254;
186 ip->protocol = IP_PROTOCOL_UDP;
187 ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) +
188 vec_len(udp_data));
189 ip->src_address.as_u32 = src_address->as_u32;
190 ip->dst_address.as_u32 = dst_address->as_u32;
191 udp->src_port = clib_host_to_net_u16 (src_port);
192 udp->dst_port = clib_host_to_net_u16 (dst_port);
193 udp->length = clib_host_to_net_u16 (vec_len (udp_data));
194 clib_memcpy (data_dst, udp_data, vec_len(udp_data));
195
196 if (compute_udp_checksum)
197 {
198 /* RFC 7011 section 10.3.2. */
199 udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
200 if (udp->checksum == 0)
201 udp->checksum = 0xffff;
202 }
203 b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) +
204 vec_len (udp_data));
205
206 }
207 b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
208
209 /* sw_if_index 0 is the "local" interface, which always exists */
210 vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
211
212 /* Use the default FIB index for tx lookup. Set non-zero to use another fib */
213 vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
214
215If your use-case calls for large packet transmission, use
216vlib_buffer_chain_append_data_with_alloc(…) to create the requisite
217buffer chain.
218
219Enqueueing packets for lookup and transmission
220~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
222The simplest way to send a set of packets is to use
223vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node
224or ip6_lookup_node, add the constructed buffer indices, and dispatch the
225frame using vlib_put_frame_to_node(…).
226
227.. code:: c
228
229 vlib_frame_t *f;
230 f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
231 f->n_vectors = vec_len(buffer_indices_to_send);
232 to_next = vlib_frame_vector_args (f);
233
234 for (i = 0; i < vec_len (buffer_indices_to_send); i++)
235 to_next[i] = buffer_indices_to_send[i];
236
237 vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
238
239It is inefficient to allocate and schedule single packet frames. That’s
240typical in case you need to send one packet per second, but should
241**not** occur in a for-loop!
242
243Packet tracer
244-------------
245
246Vlib includes a frame element [packet] trace facility, with a simple
247debug CLI interface. The cli is straightforward: “trace add
248input-node-name count” to start capturing packet traces.
249
250To trace 100 packets on a typical x86_64 system running the dpdk plugin:
251“trace add dpdk-input 100”. When using the packet generator: “trace add
252pg-input 100”
253
254To display the packet trace: “show trace”
255
256Each graph node has the opportunity to capture its own trace data. It is
257almost always a good idea to do so. The trace capture APIs are simple.
258
259The packet capture APIs snapshoot binary data, to minimize processing at
260capture time. Each participating graph node initialization provides a
261vppinfra format-style user function to pretty-print data when required
262by the VLIB “show trace” command.
263
264Set the VLIB node registration “.format_trace” member to the name of the
265per-graph node format function.
266
267Here’s a simple example:
268
269.. code:: c
270
271 u8 * my_node_format_trace (u8 * s, va_list * args)
272 {
273 vlib_main_t * vm = va_arg (*args, vlib_main_t *);
274 vlib_node_t * node = va_arg (*args, vlib_node_t *);
275 my_node_trace_t * t = va_arg (*args, my_trace_t *);
276
277 s = format (s, "My trace data was: %d", t-><whatever>);
278
279 return s;
280 }
281
282The trace framework hands the per-node format function the data it
283captured as the packet whizzed by. The format function pretty-prints the
284data as desired.
285
286Graph Dispatcher Pcap Tracing
287-----------------------------
288
289The vpp graph dispatcher knows how to capture vectors of packets in pcap
290format as they’re dispatched. The pcap captures are as follows:
291
292::
293
294 VPP graph dispatch trace record description:
295
296 0 1 2 3
297 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
299 | Major Version | Minor Version | NStrings | ProtoHint |
300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
301 | Buffer index (big endian) |
302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
303 + VPP graph node name ... ... | NULL octet |
304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
305 | Buffer Metadata ... ... | NULL octet |
306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
307 | Buffer Opaque ... ... | NULL octet |
308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
309 | Buffer Opaque 2 ... ... | NULL octet |
310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
311 | VPP ASCII packet trace (if NStrings > 4) | NULL octet |
312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
313 | Packet data (up to 16K) |
314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
315
316Graph dispatch records comprise a version stamp, an indication of how
317many NULL-terminated strings will follow the record header and preceed
318packet data, and a protocol hint.
319
320The buffer index is an opaque 32-bit cookie which allows consumers of
321these data to easily filter/track single packets as they traverse the
322forwarding graph.
323
324Multiple records per packet are normal, and to be expected. Packets will
325appear multiple times as they traverse the vpp forwarding graph. In this
326way, vpp graph dispatch traces are significantly different from regular
327network packet captures from an end-station. This property complicates
328stateful packet analysis.
329
330Restricting stateful analysis to records from a single vpp graph node
331such as “ethernet-input” seems likely to improve the situation.
332
333As of this writing: major version = 1, minor version = 0. Nstrings
334SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater
335than 5. They MAY attempt to display the claimed number of strings, or
336they MAY treat the condition as an error.
337
338Here is the current set of protocol hints:
339
340.. code:: c
341
342 typedef enum
343 {
344 VLIB_NODE_PROTO_HINT_NONE = 0,
345 VLIB_NODE_PROTO_HINT_ETHERNET,
346 VLIB_NODE_PROTO_HINT_IP4,
347 VLIB_NODE_PROTO_HINT_IP6,
348 VLIB_NODE_PROTO_HINT_TCP,
349 VLIB_NODE_PROTO_HINT_UDP,
350 VLIB_NODE_N_PROTO_HINTS,
351 } vlib_node_proto_hint_t;
352
353Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
354data SHOULD be 0x60, and should begin an ipv6 packet header.
355
356Downstream consumers of these data SHOULD pay attention to the protocol
357hint. They MUST tolerate inaccurate hints, which MAY occur from time to
358time.
359
360Dispatch Pcap Trace Debug CLI
361~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362
363To start a dispatch trace capture of up to 10,000 trace records:
364
365::
366
367 pcap dispatch trace on max 10000 file dispatch.pcap
368
369To start a dispatch trace which will also include standard vpp packet
370tracing for packets which originate in dpdk-input:
371
372::
373
374 pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
375
376To save the pcap trace, e.g. in /tmp/dispatch.pcap:
377
378::
379
380 pcap dispatch trace off
381
382Wireshark dissection of dispatch pcap traces
383~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
384
385It almost goes without saying that we built a companion wireshark
386dissector to display these traces. As of this writing, we have
387upstreamed the wireshark dissector.
388
389Since it will be a while before wireshark/master/latest makes it into
390all of the popular Linux distros, please see the “How to build a vpp
391dispatch trace aware Wireshark” page for build info.
392
393Here is a sample packet dissection, with some fields omitted for
394clarity. The point is that the wireshark dissector accurately displays
395**all** of the vpp buffer metadata, and the name of the graph node in
396question.
397
398::
399
400 Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
401 Encapsulation type: USER 13 (58)
402 [Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
403 VPP Dispatch Trace
404 BufferIndex: 0x00036663
405 NodeName: ethernet-input
406 VPP Buffer Metadata
407 Metadata: flags:
408 Metadata: current_data: 0, current_length: 102
409 Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
410 Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
411 Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
412 Metadata: free_list_index: 0
413 Metadata:
414 VPP Buffer Opaque
415 Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
416 Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
417 Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
418 Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
419 Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
420 Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
421 Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
422 Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
423 Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
424 Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
425 Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
426 Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
427 Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
428 Opaque: l2.feature_bitmap_input: none configured, L2.feature_bitmap_output: none configured
429 Opaque: l2t.next_index: 0, l2t.session_index: 0
430 Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
431 Opaque: policer.index: 0
432 Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
433 Opaque: map.mtu: 0
434 Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
435 Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
436 Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
437 Opaque: cop.current_config_index: 0
438 Opaque: lisp.overlay_afi: 0
439 Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
440 Opaque: tcp.data_len: 0, tcp.flags: 0x0
441 Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
442 Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
443 Opaque: snat.flags: 0x0
444 Opaque:
445 VPP Buffer Opaque2
446 Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
447 Opaque2: qos.bits: 0, qos.source: 0
448 Opaque2: loop_counter: 0
449 Opaque2: gbp.flags: 0, gbp.src_epg: 0
450 Opaque2: pg_replay_timestamp: 0
451 Opaque2:
452 Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6 Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
453 Source Port: 22432
454 Destination Port: 54084
455 TCP payload (36 bytes)
456 Data (36 bytes)
457
458 0000 cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11 ....S...)u>Vc...
459 0010 e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d ...'.VL!..#F....
460 0020 a8 98 36 5a ..6Z
461 Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
462 [Length: 36]
463
464It’s a matter of a couple of mouse-clicks in Wireshark to filter the
465trace to a specific buffer index. With that specific kind of filtration,
466one can watch a packet walk through the forwarding graph; noting any/all
467metadata changes, header checksum changes, and so forth.
468
469This should be of significant value when developing new vpp graph nodes.
470If new code mispositions b->current_data, it will be completely obvious
471from looking at the dispatch trace in wireshark.
472
473pcap rx, tx, and drop tracing
474-----------------------------
475
476vpp also supports rx, tx, and drop packet capture in pcap format,
477through the “pcap trace” debug CLI command.
478
479This command is used to start or stop a packet capture, or show the
480status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and
481“pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and
482“drop” to enable multiple simultaneous capture types.
483
484These commands have the following optional parameters:
485
486- rx - trace received packets.
487
488- tx - trace transmitted packets.
489
490- drop - trace dropped packets.
491
492- max *nnnn*\ - file size, number of packet captures. Once packets
493 have been received, the trace buffer buffer is flushed to the
494 indicated file. Defaults to 1000. Can only be updated if packet
495 capture is off.
496
497- max-bytes-per-pkt *nnnn*\ - maximum number of bytes to trace on a
498 per-packet basis. Must be >32 and less than 9000. Default value:
499
500 512.
501
502- filter - Use the pcap rx / tx / drop trace filter, which must be
503 configured. Use classify filter pcap… to configure the filter. The
504 filter will only be executed if the per-interface or any-interface
505 tests fail.
506
507- intfc *interface* \| *any*\ - Used to specify a given interface, or
508 use ‘any’ to run packet capture on all interfaces. ‘any’ is the
509 default if not provided. Settings from a previous packet capture are
510 preserved, so ‘any’ can be used to reset the interface setting.
511
512- file *filename*\ - Used to specify the output filename. The file
513 will be placed in the ‘/tmp’ directory. If *filename* already exists,
514 file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap
515 or tx.pcap’ will be used, depending on capture direction. Can only be
516 updated when pcap capture is off.
517
518- status - Displays the current status and configured attributes
519 associated with a packet capture. If packet capture is in progress,
520 ‘status’ also will return the number of packets currently in the
521 buffer. Any additional attributes entered on command line with a
522 ‘status’ request will be ignored.
523
524- filter - Capture packets which match the current packet trace filter
525 set. See next section. Configure the capture filter first.
526
527packet trace capture filtering
528------------------------------
529
530The “classify filter pcap \| \| trace” debug CLI command constructs an
531arbitrary set of packet classifier tables for use with “pcap rx \| tx \|
532drop trace,” and with the vpp packet tracer on a per-interface or
533system-wide basis.
534
535Packets which match a rule in the classifier table chain will be traced.
536The tables are automatically ordered so that matches in the most
537specific table are tried first.
538
539It’s reasonably likely that folks will configure a single table with one
540or two matches. As a result, we configure 8 hash buckets and 128K of
541match rule space by default. One can override the defaults by specifying
542“buckets ” and “memory-size ” as desired.
543
544To build up complex filter chains, repeatedly issue the classify filter
545debug CLI command. Each command must specify the desired mask and match
546values. If a classifier table with a suitable mask already exists, the
547CLI command adds a match rule to the existing table. If not, the CLI
548command add a new table and the indicated mask rule
549
550Configure a simple pcap classify filter
551~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
552
553::
554
555 classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
556 pcap trace rx max 100 filter
557
558Configure a simple per-interface capture filter
559~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
560
561::
562
563 classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
564 pcap trace rx max 100 intfc GigabitEthernet3/0/0
565
566Note that per-interface capture filters are *always* applied.
567
568Clear per-interface capture filters
569~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
570
571::
572
573 classify filter GigabitEthernet3/0/0 del
574
575Configure another fairly simple pcap classify filter
576~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
577
578::
579
580 classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
581 pcap trace tx max 100 filter
582
583Configure a vpp packet tracer filter
584~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
585
586::
587
588 classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
589 trace add dpdk-input 100 filter
590
591Clear all current classifier filters
592~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
593
594::
595
596 classify filter [pcap | <interface> | trace] del
597
598To inspect the classifier tables
599~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
600
601::
602
603 show classify table [verbose]
604
605The verbose form displays all of the match rules, with hit-counters.
606
607Terse description of the “mask ” syntax:
608~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
609
610::
611
612 l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
613 l3 ip4 <ip4-mask> ip6 <ip6-mask>
614 <ip4-mask> version hdr_length src[/width] dst[/width]
615 tos length fragment_id ttl protocol checksum
616 <ip6-mask> version traffic-class flow-label src dst proto
617 payload_length hop_limit protocol
618 l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
619 <tcp-mask> src dst # ports
620 <udp-mask> src_port dst_port
621
622To construct **matches**, add the values to match after the indicated
623keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “…
624match l3 ip4 src 192.168.1.11”
625
626VPP Packet Generator
627--------------------
628
629We use the VPP packet generator to inject packets into the forwarding
630graph. The packet generator can replay pcap traces, and generate packets
631out of whole cloth at respectably high performance.
632
633The VPP pg enables quite a variety of use-cases, ranging from functional
634testing of new data-plane nodes to regression testing to performance
635tuning.
636
637PG setup scripts
638----------------
639
640PG setup scripts describe traffic in detail, and leverage vpp debug CLI
641mechanisms. It’s reasonably unusual to construct a pg setup script which
642doesn’t include a certain amount of interface and FIB configuration.
643
644For example:
645
646::
647
648 loop create
649 set int ip address loop0 192.168.1.1/24
650 set int state loop0 up
651
652 packet-generator new {
653 name pg0
654 limit 100
655 rate 1e6
656 size 300-300
657 interface loop0
658 node ethernet-input
659 data { IP4: 1.2.3 -> 4.5.6
660 UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
661 UDP: 1234 -> 2345
662 incrementing 286
663 }
664 }
665
666A packet generator stream definition includes two major sections: -
667Stream Parameter Setup - Packet Data
668
669Stream Parameter Setup
670~~~~~~~~~~~~~~~~~~~~~~
671
672Given the example above, let’s look at how to set up stream parameters:
673
674- **name pg0** - Name of the stream, in this case “pg0”
675
676- **limit 1000** - Number of packets to send when the stream is
677 enabled. “limit 0” means send packets continuously.
678
679- **maxframe <nnn>** - Maximum frame size. Handy for injecting multiple
680 frames no larger than <nnn>. Useful for checking dual / quad loop
681 codes
682
683- **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not
684 specified, the packet generator injects packets as fast as possible
685
686- **size 300-300** - Packet size range, in this case send 300-byte
687 packets
688
689- **interface loop0** - Packets appear as if they were received on the
690 specified interface. This datum is used in multiple ways: to select
691 graph arc feature configuration, to select IP FIBs. Configure
692 features e.g. on loop0 to exercise those features.
693
694- **tx-interface <name>** - Packets will be transmitted on the
695 indicated interface. Typically required only when injecting packets
696 into post-IP-rewrite graph nodes.
697
698- **pcap <filename>** - Replay packets from the indicated pcap capture
699 file. “make test” makes extensive use of this feature: generate
700 packets using scapy, save them in a .pcap file, then inject them into
701 the vpp graph via a vpp pg “pcap <filename>” stream definition
702
703- **worker <nn>** - Generate packets for the stream using the indicated
704 vpp worker thread. The vpp pg generates and injects O(10 MPPS /
705 core). Use multiple stream definitions and worker threads to generate
706 and inject enough traffic to easily fill a 40 gbit pipe with small
707 packets.
708
709Data definition
710~~~~~~~~~~~~~~~
711
712Packet generator data definitions make use of a layered implementation
713strategy. Networking layers are specified in order, and the notation can
714seem a bit counter-intuitive. In the example above, the data definition
715stanza constructs a set of L2-L4 headers layers, and uses an
716incrementing fill pattern to round out the requested 300-byte packets.
717
718- **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4
719 ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
720 address of 00:04:00:05:00:06. Mac addresses may be specified in
721 either *xxxx.xxxx.xxxx* format or *xx:xx:xx:xx:xx:xx* format.
722
723- **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an
724 incrementing set of L3 (IPv4) headers for successive packets with
725 source addresses ranging from .10 to .254. All packets in the stream
726 have a constant dest address of 192.168.2.10. Set the protocol field
727 to 17, UDP.
728
729- **UDP: 1234 -> 2345** - Set the UDP source and destination ports to
730 1234 and 2345, respectively
731
732- **incrementing 256** - Insert up to 256 incrementing data bytes.
733
734Obvious variations involve “s/IP4/IP6/” in the above, along with
735changing from IPv4 to IPv6 address notation.
736
737The vpp pg can set any / all IPv4 header fields, including tos, packet
738length, mf / df / fragment id and offset, ttl, protocol, checksum, and
739src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
740details.
741
742If all else fails, specify the entire packet data in hex:
743
744- **hex 0xabcd…** - copy hex data verbatim into the packet
745
746When replaying pcap files (“**pcap <filename>**”), do not specify a data
747stanza.
748
749Diagnosing “packet-generator new” parse failures
750~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
751
752If you want to inject packets into a brand-new graph node, remember to
753tell the packet generator debug CLI how to parse the packet data stanza.
754
755If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer =
756unformat_ethernet_header”:
757
758.. code:: c
759
760 VLIB_REGISTER_NODE (ethernet_input_node) =
761 {
762 <snip>
763 .unformat_buffer = unformat_ethernet_header,
764 <snip>
765 };
766
767Beyond that, it may be necessary to set breakpoints in
768…/src/vnet/pg/cli.c. Debug image suggested.
769
770When debugging new nodes, it may be far simpler to directly inject
771ethernet frames - and add a corresponding vlib_buffer_advance in the new
772node - than to modify the packet generator.
773
774Debug CLI
775---------
776
777The descriptions above describe the “packet-generator new” debug CLI in
778detail.
779
780Additional debug CLI commands include:
781
782::
783
784 vpp# packet-generator enable [<stream-name>]
785
786which enables the named stream, or all streams.
787
788::
789
790 vpp# packet-generator disable [<stream-name>]
791
792disables the named stream, or all streams.
793
794::
795
796 vpp# packet-generator delete <stream-name>
797
798Deletes the named stream.
799
800::
801
802 vpp# packet-generator configure <stream-name> [limit <nnn>]
803 [rate <f64-pps>] [size <nn>-<nn>]
804
805Changes stream parameters without having to recreate the entire stream
806definition. Note that re-issuing a “packet-generator new” command will
807correctly recreate the named stream.