blob: f542d33ebb86a9c3208fe4b8de5c6f64adf79547 [file] [log] [blame]
Nathan Skrzypczak9ad39c02021-08-19 11:38:06 +02001VLIB (Vector Processing Library)
2================================
3
4The files associated with vlib are located in the ./src/{vlib, vlibapi,
5vlibmemory} folders. These libraries provide vector processing support
6including graph-node scheduling, reliable multicast support,
7ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL
8support, physical memory and Linux epoll support. Parts of this library
9embody US Patent 7,961,636.
10
11Init function discovery
12-----------------------
13
14vlib applications register for various [initialization] events by
15placing structures and \__attribute__((constructor)) functions into the
16image. At appropriate times, the vlib framework walks
17constructor-generated singly-linked structure lists, performs a
18topological sort based on specified constraints, and calls the indicated
19functions. Vlib applications create graph nodes, add CLI functions,
20start cooperative multi-tasking threads, etc. etc. using this mechanism.
21
22vlib applications invariably include a number of VLIB_INIT_FUNCTION
23(my_init_function) macros.
24
25Each init / configure / etc. function has the return type clib_error_t
26\*. Make sure that the function returns 0 if all is well, otherwise the
27framework will announce an error and exit.
28
29vlib applications must link against vppinfra, and often link against
30other libraries such as VNET. In the latter case, it may be necessary to
31explicitly reference symbol(s) otherwise large portions of the library
32may be AWOL at runtime.
33
34Init function construction and constraint specification
35~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36
37It’s easy to add an init function:
38
39.. code:: c
40
41 static clib_error_t *my_init_function (vlib_main_t *vm)
42 {
43 /* ... initialize things ... */
44
45 return 0; // or return clib_error_return (0, "BROKEN!");
46 }
47 VLIB_INIT_FUNCTION(my_init_function);
48
49As given, my_init_function will be executed “at some point,” but with no
50ordering guarantees.
51
52Specifying ordering constraints is easy:
53
54.. code:: c
55
56 VLIB_INIT_FUNCTION(my_init_function) =
57 {
58 .runs_before = VLIB_INITS("we_run_before_function_1",
59 "we_run_before_function_2"),
60 .runs_after = VLIB_INITS("we_run_after_function_1",
61 "we_run_after_function_2),
62 };
63
64It’s also easy to specify bulk ordering constraints of the form “a then
65b then c then d”:
66
67.. code:: c
68
69 VLIB_INIT_FUNCTION(my_init_function) =
70 {
71 .init_order = VLIB_INITS("a", "b", "c", "d"),
72 };
73
74It’s OK to specify all three sorts of ordering constraints for a single
75init function, although it’s hard to imagine why it would be necessary.
76
77Node Graph Initialization
78-------------------------
79
80vlib packet-processing applications invariably define a set of graph
81nodes to process packets.
82
83One constructs a vlib_node_registration_t, most often via the
84VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of
85such registrations into a directed graph. It is easy enough to add nodes
86to the graph at runtime. The framework does not support removing nodes.
87
88vlib provides several types of vector-processing graph nodes, primarily
89to control framework dispatch behaviors. The type member of the
90vlib_node_registration_t functions as follows:
91
92- VLIB_NODE_TYPE_PRE_INPUT - run before all other node types
93- VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input
94 nodes
95- VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by
96 adding pending frames for processing
97- VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable.
98 “Process” nodes are actually cooperative multi-tasking threads. They
99 **must** explicitly suspend after a reasonably short period of time.
100
101For a precise understanding of the graph node dispatcher, please read
102./src/vlib/main.c:vlib_main_loop.
103
104Graph node dispatcher
105---------------------
106
107Vlib_main_loop() dispatches graph nodes. The basic vector processing
108algorithm is diabolically simple, but may not be obvious from even a
109long stare at the code. Here’s how it works: some input node, or set of
110input nodes, produce a vector of work to process. The graph node
111dispatcher pushes the work vector through the directed graph,
112subdividing it as needed, until the original work vector has been
113completely processed. At that point, the process recurs.
114
115This scheme yields a stable equilibrium in frame size, by construction.
116Here’s why: as the frame size increases, the per-frame-element
117processing time decreases. There are several related forces at work; the
118simplest to describe is the effect of vector processing on the CPU L1
119I-cache. The first frame element [packet] processed by a given node
120warms up the node dispatch function in the L1 I-cache. All subsequent
121frame elements profit. As we increase the number of frame elements, the
122cost per element goes down.
123
124Under light load, it is a crazy waste of CPU cycles to run the graph
125node dispatcher flat-out. So, the graph node dispatcher arranges to wait
126for work by sitting in a timed epoll wait if the prevailing frame size
127is low. The scheme has a certain amount of hysteresis to avoid
128constantly toggling back and forth between interrupt and polling mode.
129Although the graph dispatcher supports interrupt and polling modes, our
130current default device drivers do not.
131
132The graph node scheduler uses a hierarchical timer wheel to reschedule
133process nodes upon timer expiration.
134
135Graph dispatcher internals
136--------------------------
137
138This section may be safely skipped. It’s not necessary to understand
139graph dispatcher internals to create graph nodes.
140
141Vector Data Structure
142---------------------
143
144In vpp / vlib, we represent vectors as instances of the vlib_frame_t
145type:
146
147.. code:: c
148
149 typedef struct vlib_frame_t
150 {
151 /* Frame flags. */
152 u16 flags;
153
154 /* Number of scalar bytes in arguments. */
155 u8 scalar_size;
156
157 /* Number of bytes per vector argument. */
158 u8 vector_size;
159
160 /* Number of vector elements currently in frame. */
161 u16 n_vectors;
162
163 /* Scalar and vector arguments to next node. */
164 u8 arguments[0];
165 } vlib_frame_t;
166
167Note that one *could* construct all kinds of vectors - including vectors
168with some associated scalar data - using this structure. In the vpp
169application, vectors typically use a 4-byte vector element size, and
170zero bytes’ worth of associated per-frame scalar data.
171
172Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames
173have u32 indices which make use of the alignment property, so the
174maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \*
1750xFFFFFFFF: 64*4 = 256 Gbytes.
176
177Scheduling Vectors
178------------------
179
180As you can see, vectors are not directly associated with graph nodes. We
181represent that association in a couple of ways. The simplest is the
182vlib_pending_frame_t:
183
184.. code:: c
185
186 /* A frame pending dispatch by main loop. */
187 typedef struct
188 {
189 /* Node and runtime for this frame. */
190 u32 node_runtime_index;
191
192 /* Frame index (in the heap). */
193 u32 frame_index;
194
195 /* Start of next frames for this node. */
196 u32 next_frame_index;
197
198 /* Special value for next_frame_index when there is no next frame. */
199 #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
200 } vlib_pending_frame_t;
201
202Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which
203processes frames:
204
205.. code:: c
206
207 /*
208 * Input nodes may have added work to the pending vector.
209 * Process pending vector until there is nothing left.
210 * All pending vectors will be processed from input -> output.
211 */
212 for (i = 0; i < _vec_len (nm->pending_frames); i++)
213 cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
214 /* Reset pending vector for next iteration. */
215
216The pending frame node_runtime_index associates the frame with the node
217which will process it.
218
219Complications
220-------------
221
222Fasten your seatbelt. Here’s where the story - and the data structures -
223become quite complicated…
224
225At 100,000 feet: vpp uses a directed graph, not a directed *acyclic*
226graph. It’s really quite normal for a packet to visit ip[46]-lookup
227multiple times. The worst-case: a graph node which enqueues packets to
228itself.
229
230To deal with this issue, the graph dispatcher must force allocation of a
231new frame if the current graph node’s dispatch function happens to
232enqueue a packet back to itself.
233
234There are no guarantees that a pending frame will be processed
235immediately, which means that more packets may be added to the
236underlying vlib_frame_t after it has been attached to a
237vlib_pending_frame_t. Care must be taken to allocate new frames and
238pending frames if a (pending_frame, frame) pair fills.
239
240Next frames, next frame ownership
241---------------------------------
242
243The vlib_next_frame_t is the last key graph dispatcher data structure:
244
245.. code:: c
246
247 typedef struct
248 {
249 /* Frame index. */
250 u32 frame_index;
251
252 /* Node runtime for this next. */
253 u32 node_runtime_index;
254
255 /* Next frame flags. */
256 u32 flags;
257
258 /* Reflects node frame-used flag for this next. */
259 #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
260 VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
261
262 /* This next frame owns enqueue to node
263 corresponding to node_runtime_index. */
264 #define VLIB_FRAME_OWNER (1 << 15)
265
266 /* Set when frame has been allocated for this next. */
267 #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
268
269 /* Set when frame has been added to pending vector. */
270 #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
271
272 /* Set when frame is to be freed after dispatch. */
273 #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
274
275 /* Set when frame has traced packets. */
276 #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
277
278 /* Number of vectors enqueue to this next since last overflow. */
279 u32 vectors_since_last_overflow;
280 } vlib_next_frame_t;
281
282Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32
283\*)to_next” to the right place in the vlib_frame_t corresponding to the
284ith arc (aka next0) from the current node to the indicated next node.
285
286After some scuffling around - two levels of macros - processing reaches
287vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the
288vlib_next_frame_t corresponding to the desired graph arc.
289
290The next frame data structure amounts to a graph-arc-centric frame
291cache. Once a node finishes adding element to a frame, it will acquire a
292vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But
293there’s no guarantee that more vector elements won’t be added to the
294underlying frame from the same (source_node, next_index) arc or from a
295different (source_node, next_index) arc.
296
297Maintaining consistency of the arc-to-frame cache is necessary. The
298first step in maintaining consistency is to make sure that only one
299graph node at a time thinks it “owns” the target vlib_frame_t.
300
301Back to the graph node dispatch function. In the usual case, a certain
302number of packets will be added to the vlib_frame_t acquired by calling
303vlib_get_next_frame (…).
304
305Before a dispatch function returns, it’s required to call
306vlib_put_next_frame (…) for all of the graph arcs it actually used. This
307action adds a vlib_pending_frame_t to the graph dispatcher’s pending
308frame vector.
309
310Vlib_put_next_frame makes a note in the pending frame of the frame
311index, and also of the vlib_next_frame_t index.
312
313dispatch_pending_node actions
314-----------------------------
315
316The main graph dispatch loop calls dispatch pending node as shown above.
317
318Dispatch_pending_node recovers the pending frame, and the graph node
319runtime / dispatch function. Further, it recovers the next_frame
320currently associated with the vlib_frame_t, and detaches the
321vlib_frame_t from the next_frame.
322
323In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza:
324
325.. code:: c
326
327 /* Force allocation of new frame while current frame is being
328 dispatched. */
329 restore_frame_index = ~0;
330 if (nf->frame_index == p->frame_index)
331 {
332 nf->frame_index = ~0;
333 nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
334 if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
335 restore_frame_index = p->frame_index;
336 }
337
338dispatch_pending_node is worth a hard stare due to the several
339second-order optimizations it implements. Almost as an afterthought, it
340calls dispatch_node which actually calls the graph node dispatch
341function.
342
343Process / thread model
344----------------------
345
346vlib provides an ultra-lightweight cooperative multi-tasking thread
347model. The graph node scheduler invokes these processes in much the same
348way as traditional vector-processing run-to-completion graph nodes;
349plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
350set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS.
351Yes, process is a misnomer. These are cooperative multi-tasking threads.
352
353As of this writing, the default stack size is 2<<15 = 32kb. Initialize
354the node registration’s process_log2_n_stack_bytes member as needed. The
355graph node dispatcher makes some effort to detect stack overrun, e.g. by
356mapping a no-access page below each thread stack.
357
358Process node dispatch functions are expected to be “while(1) { }” loops
359which suspend when not otherwise occupied, and which must not run for
360unreasonably long periods of time.
361
362“Unreasonably long” is an application-dependent concept. Over the years,
363we have constructed frame-size sensitive control-plane nodes which will
364use a much higher fraction of the available CPU bandwidth when the frame
365size is low. The classic example: modifying forwarding tables. So long
366as the table-builder leaves the forwarding tables in a valid state, one
367can suspend the table builder to avoid dropping packets as a result of
368control-plane activity.
369
370Process nodes can suspend for fixed amounts of time, or until another
371entity signals an event, or both. See the next section for a description
372of the vlib process event mechanism.
373
374When running in vlib process context, one must pay strict attention to
375loop invariant issues. If one walks a data structure and calls a
376function which may suspend, one had best know by construction that it
377cannot change. Often, it’s best to simply make a snapshot copy of a data
378structure, walk the copy at leisure, then free the copy.
379
380Process events
381--------------
382
383The vlib process event mechanism API is extremely lightweight and easy
384to use. Here is a typical example:
385
386.. code:: c
387
388 vlib_main_t *vm = &vlib_global_main;
389 uword event_type, * event_data = 0;
390
391 while (1)
392 {
393 vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
394
395 event_type = vlib_process_get_events (vm, &event_data);
396
397 switch (event_type) {
398 case EVENT1:
399 handle_event1s (event_data);
400 break;
401
402 case EVENT2:
403 handle_event2s (event_data);
404 break;
405
406 case ~0: /* 5-second idle/periodic */
407 handle_idle ();
408 break;
409
410 default: /* bug! */
411 ASSERT (0);
412 }
413
414 vec_reset_length(event_data);
415 }
416
417In this example, the VLIB process node waits for an event to occur, or
418for 5 seconds to elapse. The code demuxes on the event type, calling the
419appropriate handler function. Each call to vlib_process_get_events
420returns a vector of per-event-type data passed to successive
421vlib_process_signal_event calls; it is a serious error to process only
422event_data[0].
423
424Resetting the event_data vector-length to 0 [instead of calling
425vec_free] means that the event scheme doesn’t burn cycles continuously
426allocating and freeing the event data vector. This is a common vppinfra
427/ vlib coding pattern, well worth using when appropriate.
428
429Signaling an event is easy, for example:
430
431.. code:: c
432
433 vlib_process_signal_event (vm, process_node_index, EVENT1,
434 (uword)arbitrary_event1_data); /* and so forth */
435
436One can either know the process node index by construction - dig it out
437of the appropriate vlib_node_registration_t - or by finding the
438vlib_node_t with vlib_get_node_by_name(…).
439
440Buffers
441-------
442
443vlib buffering solves the usual set of packet-processing problems,
444albeit at high performance. Key in terms of performance: one ordinarily
445allocates / frees N buffers at a time rather than one at a time. Except
446when operating directly on a specific buffer, one deals with buffers by
447index, not by pointer.
448
449Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays.
450
451Packets comprise one or more vlib buffers, chained together as required.
452Multiple particle sizes are supported; hardware input nodes simply ask
453for the required size(s). Coalescing support is available. For obvious
454reasons one is discouraged from writing one’s own wild and wacky buffer
455chain traversal code.
456
457vlib buffer headers are allocated immediately prior to the buffer data
458area. In typical packet processing this saves a dependent read wait:
459given a buffer’s address, one can prefetch the buffer header [metadata]
460at the same time as the first cache line of buffer data.
461
462Buffer header metadata (vlib_buffer_t) includes the usual rewrite
463expansion space, a current_data offset, RX and TX interface indices,
464packet trace information, and a opaque areas.
465
466The opaque data is intended to control packet processing in arbitrary
467subgraph-dependent ways. The programmer shoulders responsibility for
468data lifetime analysis, type-checking, etc.
469
470Buffers have reference-counts in support of e.g. multicast replication.
471
472Shared-memory message API
473-------------------------
474
475Local control-plane and application processes interact with the vpp
476dataplane via asynchronous message-passing in shared memory over
477unidirectional queues. The same application APIs are available via
478sockets.
479
480Capturing API traces and replaying them in a simulation environment
481requires a disciplined approach to the problem. This seems like a
482make-work task, but it is not. When something goes wrong in the
483control-plane after 300,000 or 3,000,000 operations, high-speed replay
484of the events leading up to the accident is a huge win.
485
486The shared-memory message API message allocator vl_api_msg_alloc uses a
487particularly cute trick. Since messages are processed in order, we try
488to allocate message buffering from a set of fixed-size, preallocated
489rings. Each ring item has a “busy” bit. Freeing one of the preallocated
490message buffers merely requires the message consumer to clear the busy
491bit. No locking required.
492
493Debug CLI
494---------
495
496Adding debug CLI commands to VLIB applications is very simple.
497
498Here is a complete example:
499
500.. code:: c
501
502 static clib_error_t *
503 show_ip_tuple_match (vlib_main_t * vm,
504 unformat_input_t * input,
505 vlib_cli_command_t * cmd)
506 {
507 vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
508 return 0;
509 }
510
511 static VLIB_CLI_COMMAND (show_ip_tuple_command) =
512 {
513 .path = "show ip tuple match",
514 .short_help = "Show ip 5-tuple match-and-broadcast tables",
515 .function = show_ip_tuple_match,
516 };
517
518This example implements the “show ip tuple match” debug cli command. In
519ordinary usage, the vlib cli is available via the “vppctl” application,
520which sends traffic to a named pipe. One can configure debug CLI telnet
521access on a configurable port.
522
523The cli implementation has an output redirection facility which makes it
524simple to deliver cli output via shared-memory API messaging,
525
526Particularly for debug or “show tech support” type commands, it would be
527wasteful to write vlib application code to pack binary data, write more
528code elsewhere to unpack the data and finally print the answer. If a
529certain cli command has the potential to hurt packet processing
530performance by running for too long, do the work incrementally in a
531process node. The client can wait.
532
533Macro expansion
534~~~~~~~~~~~~~~~
535
536The vpp debug CLI engine includes a recursive macro expander. This is
537quite useful for factoring out address and/or interface name specifics:
538
539::
540
541 define ip1 192.168.1.1/24
542 define ip2 192.168.2.1/24
543 define iface1 GigabitEthernet3/0/0
544 define iface2 loop1
545
546 set int ip address $iface1 $ip1
547 set int ip address $iface2 $(ip2)
548
549 undefine ip1
550 undefine ip2
551 undefine iface1
552 undefine iface2
553
554Each socket (or telnet) debug CLI session has its own macro tables. All
555debug CLI sessions which use CLI_INBAND binary API messages share a
556single table.
557
558The macro expander recognizes circular definitions:
559
560::
561
562 define foo \$(bar)
563 define bar \$(mumble)
564 define mumble \$(foo)
565
566At 8 levels of recursion, the macro expander throws up its hands and
567replies “CIRCULAR.”
568
569Macro-related debug CLI commands
570~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
571
572In addition to the “define” and “undefine” debug CLI commands, use “show
573macro [noevaluate]” to dump the macro table. The “echo” debug CLI
574command will evaluate and print its argument:
575
576::
577
578 vpp# define foo This\ Is\ Foo
579 vpp# echo $foo
580 This Is Foo
581
582Handing off buffers between threads
583-----------------------------------
584
585Vlib includes an easy-to-use mechanism for handing off buffers between
586worker threads. A typical use-case: software ingress flow hashing. At a
587high level, one creates a per-worker-thread queue which sends packets to
588a specific graph node in the indicated worker thread. With the queue in
589hand, enqueue packets to the worker thread of your choice.
590
591Initialize a handoff queue
592~~~~~~~~~~~~~~~~~~~~~~~~~~
593
594Simple enough, call vlib_frame_queue_main_init:
595
596.. code:: c
597
598 main_ptr->frame_queue_index
599 = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
600
601Frame_queue_size means what it says: the number of frames which may be
602queued. Since frames contain 1…256 packets, frame_queue_size should be a
603reasonably small number (32…64). If the frame queue producer(s) are
604faster than the frame queue consumer(s), congestion will occur. Suggest
605letting the enqueue operator deal with queue congestion, as shown in the
606enqueue example below.
607
608Under the floorboards, vlib_frame_queue_main_init creates an input queue
609for each worker thread.
610
611Please do NOT create frame queues until it’s clear that they will be
612used. Although the main dispatch loop is reasonably smart about how
613often it polls the (entire set of) frame queues, polling unused frame
614queues is a waste of clock cycles.
615
616Hand off packets
617~~~~~~~~~~~~~~~~
618
619The actual handoff mechanics are simple, and integrate nicely with a
620typical graph-node dispatch function:
621
622.. code:: c
623
624 always_inline uword
625 do_handoff_inline (vlib_main_t * vm,
626 vlib_node_runtime_t * node, vlib_frame_t * frame,
627 int is_ip4, int is_trace)
628 {
629 u32 n_left_from, *from;
630 vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
631 u16 thread_indices [VLIB_FRAME_SIZE];
632 u16 nexts[VLIB_FRAME_SIZE], *next;
633 u32 n_enq;
634 htest_main_t *hmp = &htest_main;
635 int i;
636
637 from = vlib_frame_vector_args (frame);
638 n_left_from = frame->n_vectors;
639
640 vlib_get_buffers (vm, from, bufs, n_left_from);
641 next = nexts;
642 b = bufs;
643
644 /*
645 * Typical frame traversal loop, details vary with
646 * use case. Make sure to set thread_indices[i] with
647 * the desired destination thread index. You may
648 * or may not bother to set next[i].
649 */
650
651 for (i = 0; i < frame->n_vectors; i++)
652 {
653 <snip>
654 /* Pick a thread to handle this packet */
655 thread_indices[i] = f (packet_data_or_whatever);
656 <snip>
657
658 b += 1;
659 next += 1;
660 n_left_from -= 1;
661 }
662
663 /* Enqueue buffers to threads */
664 n_enq =
665 vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
666 from, thread_indices, frame->n_vectors,
667 1 /* drop on congestion */);
668 /* Typical counters,
669 if (n_enq < frame->n_vectors)
670 vlib_node_increment_counter (vm, node->node_index,
671 XXX_ERROR_CONGESTION_DROP,
672 frame->n_vectors - n_enq);
673 vlib_node_increment_counter (vm, node->node_index,
674 XXX_ERROR_HANDED_OFF, n_enq);
675 return frame->n_vectors;
676 }
677
678Notes about calling vlib_buffer_enqueue_to_thread(…):
679
680- If you pass “drop on congestion” non-zero, all packets in the inbound
681 frame will be consumed one way or the other. This is the recommended
682 setting.
683
684- In the drop-on-congestion case, please don’t try to “help” in the
685 enqueue node by freeing dropped packets, or by pushing them to
686 “error-drop.” Either of those actions would be a severe error.
687
688- It’s perfectly OK to enqueue packets to the current thread.
689
690Handoff Demo Plugin
691-------------------
692
693Check out the sample (plugin) example in …/src/examples/handoffdemo. If
694you want to build the handoff demo plugin:
695
696::
697
698 $ cd .../src/plugins
699 $ ln -s ../examples/handoffdemo
700
701This plugin provides a simple example of how to hand off packets between
702threads. We used it to debug packet-tracer handoff tracing support.
703
704Packet generator input script
705~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
706
707::
708
709 packet-generator new {
710 name x
711 limit 5
712 size 128-128
713 interface local0
714 node handoffdemo-1
715 data {
716 incrementing 30
717 }
718 }
719
720Start vpp with 2 worker threads
721~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
722
723The demo plugin hands packets from worker 1 to worker 2.
724
725Enable tracing, and start the packet generator
726~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
727
728::
729
730 trace add pg-input 100
731 packet-generator enable
732
733Sample Run
734~~~~~~~~~~
735
736::
737
738 DBGvpp# ex /tmp/pg_input_script
739 DBGvpp# pa en
740 DBGvpp# sh err
741 Count Node Reason
742 5 handoffdemo-1 packets handed off processed
743 5 handoffdemo-2 completed packets
744 DBGvpp# show run
745 Thread 1 vpp_wk_0 (lcore 0)
746 Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
747 vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
748 Name State Calls Vectors Suspends Clocks Vectors/Call
749 handoffdemo-1 active 1 5 0 4.76e3 5.00
750 pg-input disabled 2 5 0 5.58e4 2.50
751 unix-epoll-input polling 22760 0 0 2.14e7 0.00
752 ---------------
753 Thread 2 vpp_wk_1 (lcore 2)
754 Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
755 vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
756 Name State Calls Vectors Suspends Clocks Vectors/Call
757 drop active 1 5 0 1.35e4 5.00
758 error-drop active 1 5 0 2.52e4 5.00
759 handoffdemo-2 active 1 5 0 2.56e4 5.00
760 unix-epoll-input polling 22406 0 0 2.18e7 0.00
761
762Enable the packet tracer and run it again…
763
764::
765
766 DBGvpp# trace add pg-input 100
767 DBGvpp# pa en
768 DBGvpp# sh trace
769 sh trace
770 ------------------- Start of thread 0 vpp_main -------------------
771 No packets in trace buffer
772 ------------------- Start of thread 1 vpp_wk_0 -------------------
773 Packet 1
774
775 00:06:50:520688: pg-input
776 stream x, 128 bytes, 0 sw_if_index
777 current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
778 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
779 00000020: 0000000000000000000000000000000000000000000000000000000000000000
780 00000040: 0000000000000000000000000000000000000000000000000000000000000000
781 00000060: 0000000000000000000000000000000000000000000000000000000000000000
782 00:06:50:520762: handoffdemo-1
783 HANDOFFDEMO: current thread 1
784
785 Packet 2
786
787 00:06:50:520688: pg-input
788 stream x, 128 bytes, 0 sw_if_index
789 current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
790 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
791 00000020: 0000000000000000000000000000000000000000000000000000000000000000
792 00000040: 0000000000000000000000000000000000000000000000000000000000000000
793 00000060: 0000000000000000000000000000000000000000000000000000000000000000
794 00:06:50:520762: handoffdemo-1
795 HANDOFFDEMO: current thread 1
796
797 Packet 3
798
799 00:06:50:520688: pg-input
800 stream x, 128 bytes, 0 sw_if_index
801 current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
802 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
803 00000020: 0000000000000000000000000000000000000000000000000000000000000000
804 00000040: 0000000000000000000000000000000000000000000000000000000000000000
805 00000060: 0000000000000000000000000000000000000000000000000000000000000000
806 00:06:50:520762: handoffdemo-1
807 HANDOFFDEMO: current thread 1
808
809 Packet 4
810
811 00:06:50:520688: pg-input
812 stream x, 128 bytes, 0 sw_if_index
813 current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
814 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
815 00000020: 0000000000000000000000000000000000000000000000000000000000000000
816 00000040: 0000000000000000000000000000000000000000000000000000000000000000
817 00000060: 0000000000000000000000000000000000000000000000000000000000000000
818 00:06:50:520762: handoffdemo-1
819 HANDOFFDEMO: current thread 1
820
821 Packet 5
822
823 00:06:50:520688: pg-input
824 stream x, 128 bytes, 0 sw_if_index
825 current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
826 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
827 00000020: 0000000000000000000000000000000000000000000000000000000000000000
828 00000040: 0000000000000000000000000000000000000000000000000000000000000000
829 00000060: 0000000000000000000000000000000000000000000000000000000000000000
830 00:06:50:520762: handoffdemo-1
831 HANDOFFDEMO: current thread 1
832
833 ------------------- Start of thread 2 vpp_wk_1 -------------------
834 Packet 1
835
836 00:06:50:520796: handoff_trace
837 HANDED-OFF: from thread 1 trace index 0
838 00:06:50:520796: handoffdemo-2
839 HANDOFFDEMO: current thread 2
840 00:06:50:520867: error-drop
841 rx:local0
842 00:06:50:520914: drop
843 handoffdemo-2: completed packets
844
845 Packet 2
846
847 00:06:50:520796: handoff_trace
848 HANDED-OFF: from thread 1 trace index 1
849 00:06:50:520796: handoffdemo-2
850 HANDOFFDEMO: current thread 2
851 00:06:50:520867: error-drop
852 rx:local0
853 00:06:50:520914: drop
854 handoffdemo-2: completed packets
855
856 Packet 3
857
858 00:06:50:520796: handoff_trace
859 HANDED-OFF: from thread 1 trace index 2
860 00:06:50:520796: handoffdemo-2
861 HANDOFFDEMO: current thread 2
862 00:06:50:520867: error-drop
863 rx:local0
864 00:06:50:520914: drop
865 handoffdemo-2: completed packets
866
867 Packet 4
868
869 00:06:50:520796: handoff_trace
870 HANDED-OFF: from thread 1 trace index 3
871 00:06:50:520796: handoffdemo-2
872 HANDOFFDEMO: current thread 2
873 00:06:50:520867: error-drop
874 rx:local0
875 00:06:50:520914: drop
876 handoffdemo-2: completed packets
877
878 Packet 5
879
880 00:06:50:520796: handoff_trace
881 HANDED-OFF: from thread 1 trace index 4
882 00:06:50:520796: handoffdemo-2
883 HANDOFFDEMO: current thread 2
884 00:06:50:520867: error-drop
885 rx:local0
886 00:06:50:520914: drop
887 handoffdemo-2: completed packets
888 DBGvpp#