Nathan Skrzypczak | 9ad39c0 | 2021-08-19 11:38:06 +0200 | [diff] [blame^] | 1 | VNET (VPP Network Stack) |
| 2 | ======================== |
| 3 | |
| 4 | The files associated with the VPP network stack layer are located in the |
| 5 | *./src/vnet* folder. The Network Stack Layer is basically an |
| 6 | instantiation of the code in the other layers. This layer has a vnet |
| 7 | library that provides vectorized layer-2 and 3 networking graph nodes, a |
| 8 | packet generator, and a packet tracer. |
| 9 | |
| 10 | In terms of building a packet processing application, vnet provides a |
| 11 | platform-independent subgraph to which one connects a couple of |
| 12 | device-driver nodes. |
| 13 | |
| 14 | Typical RX connections include “ethernet-input” [full software |
| 15 | classification, feeds ipv4-input, ipv6-input, arp-input etc.] and |
| 16 | “ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header |
| 17 | checksum]. |
| 18 | |
| 19 | Effective graph dispatch function coding |
| 20 | ---------------------------------------- |
| 21 | |
| 22 | Over the 15 years, multiple coding styles have emerged: a |
| 23 | single/dual/quad loop coding model (with variations) and a |
| 24 | fully-pipelined coding model. |
| 25 | |
| 26 | Single/dual loops |
| 27 | ----------------- |
| 28 | |
| 29 | The single/dual/quad loop model variations conveniently solve problems |
| 30 | where the number of items to process is not known in advance: typical |
| 31 | hardware RX-ring processing. This coding style is also very effective |
| 32 | when a given node will not need to cover a complex set of dependent |
| 33 | reads. |
| 34 | |
| 35 | Here is an quad/single loop which can leverage up-to-avx512 SIMD vector |
| 36 | units to convert buffer indices to buffer pointers: |
| 37 | |
| 38 | .. code:: c |
| 39 | |
| 40 | static uword |
| 41 | simulated_ethernet_interface_tx (vlib_main_t * vm, |
| 42 | vlib_node_runtime_t * |
| 43 | node, vlib_frame_t * frame) |
| 44 | { |
| 45 | u32 n_left_from, *from; |
| 46 | u32 next_index = 0; |
| 47 | u32 n_bytes; |
| 48 | u32 thread_index = vm->thread_index; |
| 49 | vnet_main_t *vnm = vnet_get_main (); |
| 50 | vnet_interface_main_t *im = &vnm->interface_main; |
| 51 | vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b; |
| 52 | u16 nexts[VLIB_FRAME_SIZE], *next; |
| 53 | |
| 54 | n_left_from = frame->n_vectors; |
| 55 | from = vlib_frame_vector_args (frame); |
| 56 | |
| 57 | /* |
| 58 | * Convert up to VLIB_FRAME_SIZE indices in "from" to |
| 59 | * buffer pointers in bufs[] |
| 60 | */ |
| 61 | vlib_get_buffers (vm, from, bufs, n_left_from); |
| 62 | b = bufs; |
| 63 | next = nexts; |
| 64 | |
| 65 | /* |
| 66 | * While we have at least 4 vector elements (pkts) to process.. |
| 67 | */ |
| 68 | while (n_left_from >= 4) |
| 69 | { |
| 70 | /* Prefetch next quad-loop iteration. */ |
| 71 | if (PREDICT_TRUE (n_left_from >= 8)) |
| 72 | { |
| 73 | vlib_prefetch_buffer_header (b[4], STORE); |
| 74 | vlib_prefetch_buffer_header (b[5], STORE); |
| 75 | vlib_prefetch_buffer_header (b[6], STORE); |
| 76 | vlib_prefetch_buffer_header (b[7], STORE); |
| 77 | } |
| 78 | |
| 79 | /* |
| 80 | * $$$ Process 4x packets right here... |
| 81 | * set next[0..3] to send the packets where they need to go |
| 82 | */ |
| 83 | |
| 84 | do_something_to (b[0]); |
| 85 | do_something_to (b[1]); |
| 86 | do_something_to (b[2]); |
| 87 | do_something_to (b[3]); |
| 88 | |
| 89 | /* Process the next 0..4 packets */ |
| 90 | b += 4; |
| 91 | next += 4; |
| 92 | n_left_from -= 4; |
| 93 | } |
| 94 | /* |
| 95 | * Clean up 0...3 remaining packets at the end of the incoming frame |
| 96 | */ |
| 97 | while (n_left_from > 0) |
| 98 | { |
| 99 | /* |
| 100 | * $$$ Process one packet right here... |
| 101 | * set next[0..3] to send the packets where they need to go |
| 102 | */ |
| 103 | do_something_to (b[0]); |
| 104 | |
| 105 | /* Process the next packet */ |
| 106 | b += 1; |
| 107 | next += 1; |
| 108 | n_left_from -= 1; |
| 109 | } |
| 110 | |
| 111 | /* |
| 112 | * Send the packets along their respective next-node graph arcs |
| 113 | * Considerable locality of reference is expected, most if not all |
| 114 | * packets in the inbound vector will traverse the same next-node |
| 115 | * arc |
| 116 | */ |
| 117 | vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors); |
| 118 | |
| 119 | return frame->n_vectors; |
| 120 | } |
| 121 | |
| 122 | Given a packet processing task to implement, it pays to scout around |
| 123 | looking for similar tasks, and think about using the same coding |
| 124 | pattern. It is not uncommon to recode a given graph node dispatch |
| 125 | function several times during performance optimization. |
| 126 | |
| 127 | Creating Packets from Scratch |
| 128 | ----------------------------- |
| 129 | |
| 130 | At times, it’s necessary to create packets from scratch and send them. |
| 131 | Tasks like sending keepalives or actively opening connections come to |
| 132 | mind. Its not difficult, but accurate buffer metadata setup is required. |
| 133 | |
| 134 | Allocating Buffers |
| 135 | ~~~~~~~~~~~~~~~~~~ |
| 136 | |
| 137 | Use vlib_buffer_alloc, which allocates a set of buffer indices. For |
| 138 | low-performance applications, it’s OK to allocate one buffer at a time. |
| 139 | Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See |
| 140 | below. |
| 141 | |
| 142 | In high-performance cases, allocate a vector of buffer indices, and hand |
| 143 | them out from the end of the vector; decrement \_vec_len(..) as buffer |
| 144 | indices are allocated. See tcp_alloc_tx_buffers(…) and |
| 145 | tcp_get_free_buffer_index(…) for an example. |
| 146 | |
| 147 | Buffer Initialization Example |
| 148 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 149 | |
| 150 | The following example shows the **main points**, but is not to be |
| 151 | blindly cut-’n-pasted. |
| 152 | |
| 153 | .. code:: c |
| 154 | |
| 155 | u32 bi0; |
| 156 | vlib_buffer_t *b0; |
| 157 | ip4_header_t *ip; |
| 158 | udp_header_t *udp; |
| 159 | |
| 160 | /* Allocate a buffer */ |
| 161 | if (vlib_buffer_alloc (vm, &bi0, 1) != 1) |
| 162 | return -1; |
| 163 | |
| 164 | b0 = vlib_get_buffer (vm, bi0); |
| 165 | |
| 166 | /* At this point b0->current_data = 0, b0->current_length = 0 */ |
| 167 | |
| 168 | /* |
| 169 | * Copy data into the buffer. This example ASSUMES that data will fit |
| 170 | * in a single buffer, and is e.g. an ip4 packet. |
| 171 | */ |
| 172 | if (have_packet_rewrite) |
| 173 | { |
| 174 | clib_memcpy (b0->data, data, vec_len (data)); |
| 175 | b0->current_length = vec_len (data); |
| 176 | } |
| 177 | else |
| 178 | { |
| 179 | /* OR, build a udp-ip packet (for example) */ |
| 180 | ip = vlib_buffer_get_current (b0); |
| 181 | udp = (udp_header_t *) (ip + 1); |
| 182 | data_dst = (u8 *) (udp + 1); |
| 183 | |
| 184 | ip->ip_version_and_header_length = 0x45; |
| 185 | ip->ttl = 254; |
| 186 | ip->protocol = IP_PROTOCOL_UDP; |
| 187 | ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) + |
| 188 | vec_len(udp_data)); |
| 189 | ip->src_address.as_u32 = src_address->as_u32; |
| 190 | ip->dst_address.as_u32 = dst_address->as_u32; |
| 191 | udp->src_port = clib_host_to_net_u16 (src_port); |
| 192 | udp->dst_port = clib_host_to_net_u16 (dst_port); |
| 193 | udp->length = clib_host_to_net_u16 (vec_len (udp_data)); |
| 194 | clib_memcpy (data_dst, udp_data, vec_len(udp_data)); |
| 195 | |
| 196 | if (compute_udp_checksum) |
| 197 | { |
| 198 | /* RFC 7011 section 10.3.2. */ |
| 199 | udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip); |
| 200 | if (udp->checksum == 0) |
| 201 | udp->checksum = 0xffff; |
| 202 | } |
| 203 | b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) + |
| 204 | vec_len (udp_data)); |
| 205 | |
| 206 | } |
| 207 | b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID; |
| 208 | |
| 209 | /* sw_if_index 0 is the "local" interface, which always exists */ |
| 210 | vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; |
| 211 | |
| 212 | /* Use the default FIB index for tx lookup. Set non-zero to use another fib */ |
| 213 | vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0; |
| 214 | |
| 215 | If your use-case calls for large packet transmission, use |
| 216 | vlib_buffer_chain_append_data_with_alloc(…) to create the requisite |
| 217 | buffer chain. |
| 218 | |
| 219 | Enqueueing packets for lookup and transmission |
| 220 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 221 | |
| 222 | The simplest way to send a set of packets is to use |
| 223 | vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node |
| 224 | or ip6_lookup_node, add the constructed buffer indices, and dispatch the |
| 225 | frame using vlib_put_frame_to_node(…). |
| 226 | |
| 227 | .. code:: c |
| 228 | |
| 229 | vlib_frame_t *f; |
| 230 | f = vlib_get_frame_to_node (vm, ip4_lookup_node.index); |
| 231 | f->n_vectors = vec_len(buffer_indices_to_send); |
| 232 | to_next = vlib_frame_vector_args (f); |
| 233 | |
| 234 | for (i = 0; i < vec_len (buffer_indices_to_send); i++) |
| 235 | to_next[i] = buffer_indices_to_send[i]; |
| 236 | |
| 237 | vlib_put_frame_to_node (vm, ip4_lookup_node_index, f); |
| 238 | |
| 239 | It is inefficient to allocate and schedule single packet frames. That’s |
| 240 | typical in case you need to send one packet per second, but should |
| 241 | **not** occur in a for-loop! |
| 242 | |
| 243 | Packet tracer |
| 244 | ------------- |
| 245 | |
| 246 | Vlib includes a frame element [packet] trace facility, with a simple |
| 247 | debug CLI interface. The cli is straightforward: “trace add |
| 248 | input-node-name count” to start capturing packet traces. |
| 249 | |
| 250 | To trace 100 packets on a typical x86_64 system running the dpdk plugin: |
| 251 | “trace add dpdk-input 100”. When using the packet generator: “trace add |
| 252 | pg-input 100” |
| 253 | |
| 254 | To display the packet trace: “show trace” |
| 255 | |
| 256 | Each graph node has the opportunity to capture its own trace data. It is |
| 257 | almost always a good idea to do so. The trace capture APIs are simple. |
| 258 | |
| 259 | The packet capture APIs snapshoot binary data, to minimize processing at |
| 260 | capture time. Each participating graph node initialization provides a |
| 261 | vppinfra format-style user function to pretty-print data when required |
| 262 | by the VLIB “show trace” command. |
| 263 | |
| 264 | Set the VLIB node registration “.format_trace” member to the name of the |
| 265 | per-graph node format function. |
| 266 | |
| 267 | Here’s a simple example: |
| 268 | |
| 269 | .. code:: c |
| 270 | |
| 271 | u8 * my_node_format_trace (u8 * s, va_list * args) |
| 272 | { |
| 273 | vlib_main_t * vm = va_arg (*args, vlib_main_t *); |
| 274 | vlib_node_t * node = va_arg (*args, vlib_node_t *); |
| 275 | my_node_trace_t * t = va_arg (*args, my_trace_t *); |
| 276 | |
| 277 | s = format (s, "My trace data was: %d", t-><whatever>); |
| 278 | |
| 279 | return s; |
| 280 | } |
| 281 | |
| 282 | The trace framework hands the per-node format function the data it |
| 283 | captured as the packet whizzed by. The format function pretty-prints the |
| 284 | data as desired. |
| 285 | |
| 286 | Graph Dispatcher Pcap Tracing |
| 287 | ----------------------------- |
| 288 | |
| 289 | The vpp graph dispatcher knows how to capture vectors of packets in pcap |
| 290 | format as they’re dispatched. The pcap captures are as follows: |
| 291 | |
| 292 | :: |
| 293 | |
| 294 | VPP graph dispatch trace record description: |
| 295 | |
| 296 | 0 1 2 3 |
| 297 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| 298 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 299 | | Major Version | Minor Version | NStrings | ProtoHint | |
| 300 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 301 | | Buffer index (big endian) | |
| 302 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 303 | + VPP graph node name ... ... | NULL octet | |
| 304 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 305 | | Buffer Metadata ... ... | NULL octet | |
| 306 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 307 | | Buffer Opaque ... ... | NULL octet | |
| 308 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 309 | | Buffer Opaque 2 ... ... | NULL octet | |
| 310 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 311 | | VPP ASCII packet trace (if NStrings > 4) | NULL octet | |
| 312 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 313 | | Packet data (up to 16K) | |
| 314 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 315 | |
| 316 | Graph dispatch records comprise a version stamp, an indication of how |
| 317 | many NULL-terminated strings will follow the record header and preceed |
| 318 | packet data, and a protocol hint. |
| 319 | |
| 320 | The buffer index is an opaque 32-bit cookie which allows consumers of |
| 321 | these data to easily filter/track single packets as they traverse the |
| 322 | forwarding graph. |
| 323 | |
| 324 | Multiple records per packet are normal, and to be expected. Packets will |
| 325 | appear multiple times as they traverse the vpp forwarding graph. In this |
| 326 | way, vpp graph dispatch traces are significantly different from regular |
| 327 | network packet captures from an end-station. This property complicates |
| 328 | stateful packet analysis. |
| 329 | |
| 330 | Restricting stateful analysis to records from a single vpp graph node |
| 331 | such as “ethernet-input” seems likely to improve the situation. |
| 332 | |
| 333 | As of this writing: major version = 1, minor version = 0. Nstrings |
| 334 | SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater |
| 335 | than 5. They MAY attempt to display the claimed number of strings, or |
| 336 | they MAY treat the condition as an error. |
| 337 | |
| 338 | Here is the current set of protocol hints: |
| 339 | |
| 340 | .. code:: c |
| 341 | |
| 342 | typedef enum |
| 343 | { |
| 344 | VLIB_NODE_PROTO_HINT_NONE = 0, |
| 345 | VLIB_NODE_PROTO_HINT_ETHERNET, |
| 346 | VLIB_NODE_PROTO_HINT_IP4, |
| 347 | VLIB_NODE_PROTO_HINT_IP6, |
| 348 | VLIB_NODE_PROTO_HINT_TCP, |
| 349 | VLIB_NODE_PROTO_HINT_UDP, |
| 350 | VLIB_NODE_N_PROTO_HINTS, |
| 351 | } vlib_node_proto_hint_t; |
| 352 | |
| 353 | Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet |
| 354 | data SHOULD be 0x60, and should begin an ipv6 packet header. |
| 355 | |
| 356 | Downstream consumers of these data SHOULD pay attention to the protocol |
| 357 | hint. They MUST tolerate inaccurate hints, which MAY occur from time to |
| 358 | time. |
| 359 | |
| 360 | Dispatch Pcap Trace Debug CLI |
| 361 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 362 | |
| 363 | To start a dispatch trace capture of up to 10,000 trace records: |
| 364 | |
| 365 | :: |
| 366 | |
| 367 | pcap dispatch trace on max 10000 file dispatch.pcap |
| 368 | |
| 369 | To start a dispatch trace which will also include standard vpp packet |
| 370 | tracing for packets which originate in dpdk-input: |
| 371 | |
| 372 | :: |
| 373 | |
| 374 | pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000 |
| 375 | |
| 376 | To save the pcap trace, e.g. in /tmp/dispatch.pcap: |
| 377 | |
| 378 | :: |
| 379 | |
| 380 | pcap dispatch trace off |
| 381 | |
| 382 | Wireshark dissection of dispatch pcap traces |
| 383 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 384 | |
| 385 | It almost goes without saying that we built a companion wireshark |
| 386 | dissector to display these traces. As of this writing, we have |
| 387 | upstreamed the wireshark dissector. |
| 388 | |
| 389 | Since it will be a while before wireshark/master/latest makes it into |
| 390 | all of the popular Linux distros, please see the “How to build a vpp |
| 391 | dispatch trace aware Wireshark” page for build info. |
| 392 | |
| 393 | Here is a sample packet dissection, with some fields omitted for |
| 394 | clarity. The point is that the wireshark dissector accurately displays |
| 395 | **all** of the vpp buffer metadata, and the name of the graph node in |
| 396 | question. |
| 397 | |
| 398 | :: |
| 399 | |
| 400 | Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits) |
| 401 | Encapsulation type: USER 13 (58) |
| 402 | [Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data] |
| 403 | VPP Dispatch Trace |
| 404 | BufferIndex: 0x00036663 |
| 405 | NodeName: ethernet-input |
| 406 | VPP Buffer Metadata |
| 407 | Metadata: flags: |
| 408 | Metadata: current_data: 0, current_length: 102 |
| 409 | Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0 |
| 410 | Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0 |
| 411 | Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0 |
| 412 | Metadata: free_list_index: 0 |
| 413 | Metadata: |
| 414 | VPP Buffer Opaque |
| 415 | Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 |
| 416 | Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1 |
| 417 | Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0 |
| 418 | Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0 |
| 419 | Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0 |
| 420 | Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0 |
| 421 | Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0 |
| 422 | Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0 |
| 423 | Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0 |
| 424 | Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0 |
| 425 | Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0 |
| 426 | Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0 |
| 427 | Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0 |
| 428 | Opaque: l2.feature_bitmap_input: none configured, L2.feature_bitmap_output: none configured |
| 429 | Opaque: l2t.next_index: 0, l2t.session_index: 0 |
| 430 | Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0 |
| 431 | Opaque: policer.index: 0 |
| 432 | Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0 |
| 433 | Opaque: map.mtu: 0 |
| 434 | Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0 |
| 435 | Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0 |
| 436 | Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0 |
| 437 | Opaque: cop.current_config_index: 0 |
| 438 | Opaque: lisp.overlay_afi: 0 |
| 439 | Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0 |
| 440 | Opaque: tcp.data_len: 0, tcp.flags: 0x0 |
| 441 | Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0 |
| 442 | Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0 |
| 443 | Opaque: snat.flags: 0x0 |
| 444 | Opaque: |
| 445 | VPP Buffer Opaque2 |
| 446 | Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 |
| 447 | Opaque2: qos.bits: 0, qos.source: 0 |
| 448 | Opaque2: loop_counter: 0 |
| 449 | Opaque2: gbp.flags: 0, gbp.src_epg: 0 |
| 450 | Opaque2: pg_replay_timestamp: 0 |
| 451 | Opaque2: |
| 452 | Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6 Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36 |
| 453 | Source Port: 22432 |
| 454 | Destination Port: 54084 |
| 455 | TCP payload (36 bytes) |
| 456 | Data (36 bytes) |
| 457 | |
| 458 | 0000 cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11 ....S...)u>Vc... |
| 459 | 0010 e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d ...'.VL!..#F.... |
| 460 | 0020 a8 98 36 5a ..6Z |
| 461 | Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21… |
| 462 | [Length: 36] |
| 463 | |
| 464 | It’s a matter of a couple of mouse-clicks in Wireshark to filter the |
| 465 | trace to a specific buffer index. With that specific kind of filtration, |
| 466 | one can watch a packet walk through the forwarding graph; noting any/all |
| 467 | metadata changes, header checksum changes, and so forth. |
| 468 | |
| 469 | This should be of significant value when developing new vpp graph nodes. |
| 470 | If new code mispositions b->current_data, it will be completely obvious |
| 471 | from looking at the dispatch trace in wireshark. |
| 472 | |
| 473 | pcap rx, tx, and drop tracing |
| 474 | ----------------------------- |
| 475 | |
| 476 | vpp also supports rx, tx, and drop packet capture in pcap format, |
| 477 | through the “pcap trace” debug CLI command. |
| 478 | |
| 479 | This command is used to start or stop a packet capture, or show the |
| 480 | status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and |
| 481 | “pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and |
| 482 | “drop” to enable multiple simultaneous capture types. |
| 483 | |
| 484 | These commands have the following optional parameters: |
| 485 | |
| 486 | - rx - trace received packets. |
| 487 | |
| 488 | - tx - trace transmitted packets. |
| 489 | |
| 490 | - drop - trace dropped packets. |
| 491 | |
| 492 | - max *nnnn*\ - file size, number of packet captures. Once packets |
| 493 | have been received, the trace buffer buffer is flushed to the |
| 494 | indicated file. Defaults to 1000. Can only be updated if packet |
| 495 | capture is off. |
| 496 | |
| 497 | - max-bytes-per-pkt *nnnn*\ - maximum number of bytes to trace on a |
| 498 | per-packet basis. Must be >32 and less than 9000. Default value: |
| 499 | |
| 500 | 512. |
| 501 | |
| 502 | - filter - Use the pcap rx / tx / drop trace filter, which must be |
| 503 | configured. Use classify filter pcap… to configure the filter. The |
| 504 | filter will only be executed if the per-interface or any-interface |
| 505 | tests fail. |
| 506 | |
| 507 | - intfc *interface* \| *any*\ - Used to specify a given interface, or |
| 508 | use ‘any’ to run packet capture on all interfaces. ‘any’ is the |
| 509 | default if not provided. Settings from a previous packet capture are |
| 510 | preserved, so ‘any’ can be used to reset the interface setting. |
| 511 | |
| 512 | - file *filename*\ - Used to specify the output filename. The file |
| 513 | will be placed in the ‘/tmp’ directory. If *filename* already exists, |
| 514 | file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap |
| 515 | or tx.pcap’ will be used, depending on capture direction. Can only be |
| 516 | updated when pcap capture is off. |
| 517 | |
| 518 | - status - Displays the current status and configured attributes |
| 519 | associated with a packet capture. If packet capture is in progress, |
| 520 | ‘status’ also will return the number of packets currently in the |
| 521 | buffer. Any additional attributes entered on command line with a |
| 522 | ‘status’ request will be ignored. |
| 523 | |
| 524 | - filter - Capture packets which match the current packet trace filter |
| 525 | set. See next section. Configure the capture filter first. |
| 526 | |
| 527 | packet trace capture filtering |
| 528 | ------------------------------ |
| 529 | |
| 530 | The “classify filter pcap \| \| trace” debug CLI command constructs an |
| 531 | arbitrary set of packet classifier tables for use with “pcap rx \| tx \| |
| 532 | drop trace,” and with the vpp packet tracer on a per-interface or |
| 533 | system-wide basis. |
| 534 | |
| 535 | Packets which match a rule in the classifier table chain will be traced. |
| 536 | The tables are automatically ordered so that matches in the most |
| 537 | specific table are tried first. |
| 538 | |
| 539 | It’s reasonably likely that folks will configure a single table with one |
| 540 | or two matches. As a result, we configure 8 hash buckets and 128K of |
| 541 | match rule space by default. One can override the defaults by specifying |
| 542 | “buckets ” and “memory-size ” as desired. |
| 543 | |
| 544 | To build up complex filter chains, repeatedly issue the classify filter |
| 545 | debug CLI command. Each command must specify the desired mask and match |
| 546 | values. If a classifier table with a suitable mask already exists, the |
| 547 | CLI command adds a match rule to the existing table. If not, the CLI |
| 548 | command add a new table and the indicated mask rule |
| 549 | |
| 550 | Configure a simple pcap classify filter |
| 551 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 552 | |
| 553 | :: |
| 554 | |
| 555 | classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11 |
| 556 | pcap trace rx max 100 filter |
| 557 | |
| 558 | Configure a simple per-interface capture filter |
| 559 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 560 | |
| 561 | :: |
| 562 | |
| 563 | classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11" |
| 564 | pcap trace rx max 100 intfc GigabitEthernet3/0/0 |
| 565 | |
| 566 | Note that per-interface capture filters are *always* applied. |
| 567 | |
| 568 | Clear per-interface capture filters |
| 569 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 570 | |
| 571 | :: |
| 572 | |
| 573 | classify filter GigabitEthernet3/0/0 del |
| 574 | |
| 575 | Configure another fairly simple pcap classify filter |
| 576 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 577 | |
| 578 | :: |
| 579 | |
| 580 | classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10 |
| 581 | pcap trace tx max 100 filter |
| 582 | |
| 583 | Configure a vpp packet tracer filter |
| 584 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 585 | |
| 586 | :: |
| 587 | |
| 588 | classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10 |
| 589 | trace add dpdk-input 100 filter |
| 590 | |
| 591 | Clear all current classifier filters |
| 592 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 593 | |
| 594 | :: |
| 595 | |
| 596 | classify filter [pcap | <interface> | trace] del |
| 597 | |
| 598 | To inspect the classifier tables |
| 599 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 600 | |
| 601 | :: |
| 602 | |
| 603 | show classify table [verbose] |
| 604 | |
| 605 | The verbose form displays all of the match rules, with hit-counters. |
| 606 | |
| 607 | Terse description of the “mask ” syntax: |
| 608 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 609 | |
| 610 | :: |
| 611 | |
| 612 | l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad |
| 613 | l3 ip4 <ip4-mask> ip6 <ip6-mask> |
| 614 | <ip4-mask> version hdr_length src[/width] dst[/width] |
| 615 | tos length fragment_id ttl protocol checksum |
| 616 | <ip6-mask> version traffic-class flow-label src dst proto |
| 617 | payload_length hop_limit protocol |
| 618 | l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port |
| 619 | <tcp-mask> src dst # ports |
| 620 | <udp-mask> src_port dst_port |
| 621 | |
| 622 | To construct **matches**, add the values to match after the indicated |
| 623 | keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “… |
| 624 | match l3 ip4 src 192.168.1.11” |
| 625 | |
| 626 | VPP Packet Generator |
| 627 | -------------------- |
| 628 | |
| 629 | We use the VPP packet generator to inject packets into the forwarding |
| 630 | graph. The packet generator can replay pcap traces, and generate packets |
| 631 | out of whole cloth at respectably high performance. |
| 632 | |
| 633 | The VPP pg enables quite a variety of use-cases, ranging from functional |
| 634 | testing of new data-plane nodes to regression testing to performance |
| 635 | tuning. |
| 636 | |
| 637 | PG setup scripts |
| 638 | ---------------- |
| 639 | |
| 640 | PG setup scripts describe traffic in detail, and leverage vpp debug CLI |
| 641 | mechanisms. It’s reasonably unusual to construct a pg setup script which |
| 642 | doesn’t include a certain amount of interface and FIB configuration. |
| 643 | |
| 644 | For example: |
| 645 | |
| 646 | :: |
| 647 | |
| 648 | loop create |
| 649 | set int ip address loop0 192.168.1.1/24 |
| 650 | set int state loop0 up |
| 651 | |
| 652 | packet-generator new { |
| 653 | name pg0 |
| 654 | limit 100 |
| 655 | rate 1e6 |
| 656 | size 300-300 |
| 657 | interface loop0 |
| 658 | node ethernet-input |
| 659 | data { IP4: 1.2.3 -> 4.5.6 |
| 660 | UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10 |
| 661 | UDP: 1234 -> 2345 |
| 662 | incrementing 286 |
| 663 | } |
| 664 | } |
| 665 | |
| 666 | A packet generator stream definition includes two major sections: - |
| 667 | Stream Parameter Setup - Packet Data |
| 668 | |
| 669 | Stream Parameter Setup |
| 670 | ~~~~~~~~~~~~~~~~~~~~~~ |
| 671 | |
| 672 | Given the example above, let’s look at how to set up stream parameters: |
| 673 | |
| 674 | - **name pg0** - Name of the stream, in this case “pg0” |
| 675 | |
| 676 | - **limit 1000** - Number of packets to send when the stream is |
| 677 | enabled. “limit 0” means send packets continuously. |
| 678 | |
| 679 | - **maxframe <nnn>** - Maximum frame size. Handy for injecting multiple |
| 680 | frames no larger than <nnn>. Useful for checking dual / quad loop |
| 681 | codes |
| 682 | |
| 683 | - **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not |
| 684 | specified, the packet generator injects packets as fast as possible |
| 685 | |
| 686 | - **size 300-300** - Packet size range, in this case send 300-byte |
| 687 | packets |
| 688 | |
| 689 | - **interface loop0** - Packets appear as if they were received on the |
| 690 | specified interface. This datum is used in multiple ways: to select |
| 691 | graph arc feature configuration, to select IP FIBs. Configure |
| 692 | features e.g. on loop0 to exercise those features. |
| 693 | |
| 694 | - **tx-interface <name>** - Packets will be transmitted on the |
| 695 | indicated interface. Typically required only when injecting packets |
| 696 | into post-IP-rewrite graph nodes. |
| 697 | |
| 698 | - **pcap <filename>** - Replay packets from the indicated pcap capture |
| 699 | file. “make test” makes extensive use of this feature: generate |
| 700 | packets using scapy, save them in a .pcap file, then inject them into |
| 701 | the vpp graph via a vpp pg “pcap <filename>” stream definition |
| 702 | |
| 703 | - **worker <nn>** - Generate packets for the stream using the indicated |
| 704 | vpp worker thread. The vpp pg generates and injects O(10 MPPS / |
| 705 | core). Use multiple stream definitions and worker threads to generate |
| 706 | and inject enough traffic to easily fill a 40 gbit pipe with small |
| 707 | packets. |
| 708 | |
| 709 | Data definition |
| 710 | ~~~~~~~~~~~~~~~ |
| 711 | |
| 712 | Packet generator data definitions make use of a layered implementation |
| 713 | strategy. Networking layers are specified in order, and the notation can |
| 714 | seem a bit counter-intuitive. In the example above, the data definition |
| 715 | stanza constructs a set of L2-L4 headers layers, and uses an |
| 716 | incrementing fill pattern to round out the requested 300-byte packets. |
| 717 | |
| 718 | - **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4 |
| 719 | ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC |
| 720 | address of 00:04:00:05:00:06. Mac addresses may be specified in |
| 721 | either *xxxx.xxxx.xxxx* format or *xx:xx:xx:xx:xx:xx* format. |
| 722 | |
| 723 | - **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an |
| 724 | incrementing set of L3 (IPv4) headers for successive packets with |
| 725 | source addresses ranging from .10 to .254. All packets in the stream |
| 726 | have a constant dest address of 192.168.2.10. Set the protocol field |
| 727 | to 17, UDP. |
| 728 | |
| 729 | - **UDP: 1234 -> 2345** - Set the UDP source and destination ports to |
| 730 | 1234 and 2345, respectively |
| 731 | |
| 732 | - **incrementing 256** - Insert up to 256 incrementing data bytes. |
| 733 | |
| 734 | Obvious variations involve “s/IP4/IP6/” in the above, along with |
| 735 | changing from IPv4 to IPv6 address notation. |
| 736 | |
| 737 | The vpp pg can set any / all IPv4 header fields, including tos, packet |
| 738 | length, mf / df / fragment id and offset, ttl, protocol, checksum, and |
| 739 | src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for |
| 740 | details. |
| 741 | |
| 742 | If all else fails, specify the entire packet data in hex: |
| 743 | |
| 744 | - **hex 0xabcd…** - copy hex data verbatim into the packet |
| 745 | |
| 746 | When replaying pcap files (“**pcap <filename>**”), do not specify a data |
| 747 | stanza. |
| 748 | |
| 749 | Diagnosing “packet-generator new” parse failures |
| 750 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 751 | |
| 752 | If you want to inject packets into a brand-new graph node, remember to |
| 753 | tell the packet generator debug CLI how to parse the packet data stanza. |
| 754 | |
| 755 | If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer = |
| 756 | unformat_ethernet_header”: |
| 757 | |
| 758 | .. code:: c |
| 759 | |
| 760 | VLIB_REGISTER_NODE (ethernet_input_node) = |
| 761 | { |
| 762 | <snip> |
| 763 | .unformat_buffer = unformat_ethernet_header, |
| 764 | <snip> |
| 765 | }; |
| 766 | |
| 767 | Beyond that, it may be necessary to set breakpoints in |
| 768 | …/src/vnet/pg/cli.c. Debug image suggested. |
| 769 | |
| 770 | When debugging new nodes, it may be far simpler to directly inject |
| 771 | ethernet frames - and add a corresponding vlib_buffer_advance in the new |
| 772 | node - than to modify the packet generator. |
| 773 | |
| 774 | Debug CLI |
| 775 | --------- |
| 776 | |
| 777 | The descriptions above describe the “packet-generator new” debug CLI in |
| 778 | detail. |
| 779 | |
| 780 | Additional debug CLI commands include: |
| 781 | |
| 782 | :: |
| 783 | |
| 784 | vpp# packet-generator enable [<stream-name>] |
| 785 | |
| 786 | which enables the named stream, or all streams. |
| 787 | |
| 788 | :: |
| 789 | |
| 790 | vpp# packet-generator disable [<stream-name>] |
| 791 | |
| 792 | disables the named stream, or all streams. |
| 793 | |
| 794 | :: |
| 795 | |
| 796 | vpp# packet-generator delete <stream-name> |
| 797 | |
| 798 | Deletes the named stream. |
| 799 | |
| 800 | :: |
| 801 | |
| 802 | vpp# packet-generator configure <stream-name> [limit <nnn>] |
| 803 | [rate <f64-pps>] [size <nn>-<nn>] |
| 804 | |
| 805 | Changes stream parameters without having to recreate the entire stream |
| 806 | definition. Note that re-issuing a “packet-generator new” command will |
| 807 | correctly recreate the named stream. |