| VLIB (Vector Processing Library) |
| ================================ |
| |
| The files associated with vlib are located in the ./src/{vlib, vlibapi, |
| vlibmemory} folders. These libraries provide vector processing support |
| including graph-node scheduling, reliable multicast support, |
| ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL |
| support, physical memory and Linux epoll support. Parts of this library |
| embody US Patent 7,961,636. |
| |
| Init function discovery |
| ----------------------- |
| |
| vlib applications register for various [initialization] events by |
| placing structures and \__attribute__((constructor)) functions into the |
| image. At appropriate times, the vlib framework walks |
| constructor-generated singly-linked structure lists, performs a |
| topological sort based on specified constraints, and calls the indicated |
| functions. Vlib applications create graph nodes, add CLI functions, |
| start cooperative multi-tasking threads, etc. etc. using this mechanism. |
| |
| vlib applications invariably include a number of VLIB_INIT_FUNCTION |
| (my_init_function) macros. |
| |
| Each init / configure / etc. function has the return type clib_error_t |
| \*. Make sure that the function returns 0 if all is well, otherwise the |
| framework will announce an error and exit. |
| |
| vlib applications must link against vppinfra, and often link against |
| other libraries such as VNET. In the latter case, it may be necessary to |
| explicitly reference symbol(s) otherwise large portions of the library |
| may be AWOL at runtime. |
| |
| Init function construction and constraint specification |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| It’s easy to add an init function: |
| |
| .. code:: c |
| |
| static clib_error_t *my_init_function (vlib_main_t *vm) |
| { |
| /* ... initialize things ... */ |
| |
| return 0; // or return clib_error_return (0, "BROKEN!"); |
| } |
| VLIB_INIT_FUNCTION(my_init_function); |
| |
| As given, my_init_function will be executed “at some point,” but with no |
| ordering guarantees. |
| |
| Specifying ordering constraints is easy: |
| |
| .. code:: c |
| |
| VLIB_INIT_FUNCTION(my_init_function) = |
| { |
| .runs_before = VLIB_INITS("we_run_before_function_1", |
| "we_run_before_function_2"), |
| .runs_after = VLIB_INITS("we_run_after_function_1", |
| "we_run_after_function_2), |
| }; |
| |
| It’s also easy to specify bulk ordering constraints of the form “a then |
| b then c then d”: |
| |
| .. code:: c |
| |
| VLIB_INIT_FUNCTION(my_init_function) = |
| { |
| .init_order = VLIB_INITS("a", "b", "c", "d"), |
| }; |
| |
| It’s OK to specify all three sorts of ordering constraints for a single |
| init function, although it’s hard to imagine why it would be necessary. |
| |
| Node Graph Initialization |
| ------------------------- |
| |
| vlib packet-processing applications invariably define a set of graph |
| nodes to process packets. |
| |
| One constructs a vlib_node_registration_t, most often via the |
| VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of |
| such registrations into a directed graph. It is easy enough to add nodes |
| to the graph at runtime. The framework does not support removing nodes. |
| |
| vlib provides several types of vector-processing graph nodes, primarily |
| to control framework dispatch behaviors. The type member of the |
| vlib_node_registration_t functions as follows: |
| |
| - VLIB_NODE_TYPE_PRE_INPUT - run before all other node types |
| - VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input |
| nodes |
| - VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by |
| adding pending frames for processing |
| - VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable. |
| “Process” nodes are actually cooperative multi-tasking threads. They |
| **must** explicitly suspend after a reasonably short period of time. |
| |
| For a precise understanding of the graph node dispatcher, please read |
| ./src/vlib/main.c:vlib_main_loop. |
| |
| Graph node dispatcher |
| --------------------- |
| |
| Vlib_main_loop() dispatches graph nodes. The basic vector processing |
| algorithm is diabolically simple, but may not be obvious from even a |
| long stare at the code. Here’s how it works: some input node, or set of |
| input nodes, produce a vector of work to process. The graph node |
| dispatcher pushes the work vector through the directed graph, |
| subdividing it as needed, until the original work vector has been |
| completely processed. At that point, the process recurs. |
| |
| This scheme yields a stable equilibrium in frame size, by construction. |
| Here’s why: as the frame size increases, the per-frame-element |
| processing time decreases. There are several related forces at work; the |
| simplest to describe is the effect of vector processing on the CPU L1 |
| I-cache. The first frame element [packet] processed by a given node |
| warms up the node dispatch function in the L1 I-cache. All subsequent |
| frame elements profit. As we increase the number of frame elements, the |
| cost per element goes down. |
| |
| Under light load, it is a crazy waste of CPU cycles to run the graph |
| node dispatcher flat-out. So, the graph node dispatcher arranges to wait |
| for work by sitting in a timed epoll wait if the prevailing frame size |
| is low. The scheme has a certain amount of hysteresis to avoid |
| constantly toggling back and forth between interrupt and polling mode. |
| Although the graph dispatcher supports interrupt and polling modes, our |
| current default device drivers do not. |
| |
| The graph node scheduler uses a hierarchical timer wheel to reschedule |
| process nodes upon timer expiration. |
| |
| Graph dispatcher internals |
| -------------------------- |
| |
| This section may be safely skipped. It’s not necessary to understand |
| graph dispatcher internals to create graph nodes. |
| |
| Vector Data Structure |
| --------------------- |
| |
| In vpp / vlib, we represent vectors as instances of the vlib_frame_t |
| type: |
| |
| .. code:: c |
| |
| typedef struct vlib_frame_t |
| { |
| /* Frame flags. */ |
| u16 flags; |
| |
| /* Number of scalar bytes in arguments. */ |
| u8 scalar_size; |
| |
| /* Number of bytes per vector argument. */ |
| u8 vector_size; |
| |
| /* Number of vector elements currently in frame. */ |
| u16 n_vectors; |
| |
| /* Scalar and vector arguments to next node. */ |
| u8 arguments[0]; |
| } vlib_frame_t; |
| |
| Note that one *could* construct all kinds of vectors - including vectors |
| with some associated scalar data - using this structure. In the vpp |
| application, vectors typically use a 4-byte vector element size, and |
| zero bytes’ worth of associated per-frame scalar data. |
| |
| Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames |
| have u32 indices which make use of the alignment property, so the |
| maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \* |
| 0xFFFFFFFF: 64*4 = 256 Gbytes. |
| |
| Scheduling Vectors |
| ------------------ |
| |
| As you can see, vectors are not directly associated with graph nodes. We |
| represent that association in a couple of ways. The simplest is the |
| vlib_pending_frame_t: |
| |
| .. code:: c |
| |
| /* A frame pending dispatch by main loop. */ |
| typedef struct |
| { |
| /* Node and runtime for this frame. */ |
| u32 node_runtime_index; |
| |
| /* Frame index (in the heap). */ |
| u32 frame_index; |
| |
| /* Start of next frames for this node. */ |
| u32 next_frame_index; |
| |
| /* Special value for next_frame_index when there is no next frame. */ |
| #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0) |
| } vlib_pending_frame_t; |
| |
| Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which |
| processes frames: |
| |
| .. code:: c |
| |
| /* |
| * Input nodes may have added work to the pending vector. |
| * Process pending vector until there is nothing left. |
| * All pending vectors will be processed from input -> output. |
| */ |
| for (i = 0; i < _vec_len (nm->pending_frames); i++) |
| cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now); |
| /* Reset pending vector for next iteration. */ |
| |
| The pending frame node_runtime_index associates the frame with the node |
| which will process it. |
| |
| Complications |
| ------------- |
| |
| Fasten your seatbelt. Here’s where the story - and the data structures - |
| become quite complicated… |
| |
| At 100,000 feet: vpp uses a directed graph, not a directed *acyclic* |
| graph. It’s really quite normal for a packet to visit ip[46]-lookup |
| multiple times. The worst-case: a graph node which enqueues packets to |
| itself. |
| |
| To deal with this issue, the graph dispatcher must force allocation of a |
| new frame if the current graph node’s dispatch function happens to |
| enqueue a packet back to itself. |
| |
| There are no guarantees that a pending frame will be processed |
| immediately, which means that more packets may be added to the |
| underlying vlib_frame_t after it has been attached to a |
| vlib_pending_frame_t. Care must be taken to allocate new frames and |
| pending frames if a (pending_frame, frame) pair fills. |
| |
| Next frames, next frame ownership |
| --------------------------------- |
| |
| The vlib_next_frame_t is the last key graph dispatcher data structure: |
| |
| .. code:: c |
| |
| typedef struct |
| { |
| /* Frame index. */ |
| u32 frame_index; |
| |
| /* Node runtime for this next. */ |
| u32 node_runtime_index; |
| |
| /* Next frame flags. */ |
| u32 flags; |
| |
| /* Reflects node frame-used flag for this next. */ |
| #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \ |
| VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH |
| |
| /* This next frame owns enqueue to node |
| corresponding to node_runtime_index. */ |
| #define VLIB_FRAME_OWNER (1 << 15) |
| |
| /* Set when frame has been allocated for this next. */ |
| #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT |
| |
| /* Set when frame has been added to pending vector. */ |
| #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP |
| |
| /* Set when frame is to be freed after dispatch. */ |
| #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT |
| |
| /* Set when frame has traced packets. */ |
| #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE |
| |
| /* Number of vectors enqueue to this next since last overflow. */ |
| u32 vectors_since_last_overflow; |
| } vlib_next_frame_t; |
| |
| Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32 |
| \*)to_next” to the right place in the vlib_frame_t corresponding to the |
| ith arc (aka next0) from the current node to the indicated next node. |
| |
| After some scuffling around - two levels of macros - processing reaches |
| vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the |
| vlib_next_frame_t corresponding to the desired graph arc. |
| |
| The next frame data structure amounts to a graph-arc-centric frame |
| cache. Once a node finishes adding element to a frame, it will acquire a |
| vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But |
| there’s no guarantee that more vector elements won’t be added to the |
| underlying frame from the same (source_node, next_index) arc or from a |
| different (source_node, next_index) arc. |
| |
| Maintaining consistency of the arc-to-frame cache is necessary. The |
| first step in maintaining consistency is to make sure that only one |
| graph node at a time thinks it “owns” the target vlib_frame_t. |
| |
| Back to the graph node dispatch function. In the usual case, a certain |
| number of packets will be added to the vlib_frame_t acquired by calling |
| vlib_get_next_frame (…). |
| |
| Before a dispatch function returns, it’s required to call |
| vlib_put_next_frame (…) for all of the graph arcs it actually used. This |
| action adds a vlib_pending_frame_t to the graph dispatcher’s pending |
| frame vector. |
| |
| Vlib_put_next_frame makes a note in the pending frame of the frame |
| index, and also of the vlib_next_frame_t index. |
| |
| dispatch_pending_node actions |
| ----------------------------- |
| |
| The main graph dispatch loop calls dispatch pending node as shown above. |
| |
| Dispatch_pending_node recovers the pending frame, and the graph node |
| runtime / dispatch function. Further, it recovers the next_frame |
| currently associated with the vlib_frame_t, and detaches the |
| vlib_frame_t from the next_frame. |
| |
| In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza: |
| |
| .. code:: c |
| |
| /* Force allocation of new frame while current frame is being |
| dispatched. */ |
| restore_frame_index = ~0; |
| if (nf->frame_index == p->frame_index) |
| { |
| nf->frame_index = ~0; |
| nf->flags &= ~VLIB_FRAME_IS_ALLOCATED; |
| if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH)) |
| restore_frame_index = p->frame_index; |
| } |
| |
| dispatch_pending_node is worth a hard stare due to the several |
| second-order optimizations it implements. Almost as an afterthought, it |
| calls dispatch_node which actually calls the graph node dispatch |
| function. |
| |
| Process / thread model |
| ---------------------- |
| |
| vlib provides an ultra-lightweight cooperative multi-tasking thread |
| model. The graph node scheduler invokes these processes in much the same |
| way as traditional vector-processing run-to-completion graph nodes; |
| plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply |
| set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS. |
| Yes, process is a misnomer. These are cooperative multi-tasking threads. |
| |
| As of this writing, the default stack size is 2<<15 = 32kb. Initialize |
| the node registration’s process_log2_n_stack_bytes member as needed. The |
| graph node dispatcher makes some effort to detect stack overrun, e.g. by |
| mapping a no-access page below each thread stack. |
| |
| Process node dispatch functions are expected to be “while(1) { }” loops |
| which suspend when not otherwise occupied, and which must not run for |
| unreasonably long periods of time. |
| |
| “Unreasonably long” is an application-dependent concept. Over the years, |
| we have constructed frame-size sensitive control-plane nodes which will |
| use a much higher fraction of the available CPU bandwidth when the frame |
| size is low. The classic example: modifying forwarding tables. So long |
| as the table-builder leaves the forwarding tables in a valid state, one |
| can suspend the table builder to avoid dropping packets as a result of |
| control-plane activity. |
| |
| Process nodes can suspend for fixed amounts of time, or until another |
| entity signals an event, or both. See the next section for a description |
| of the vlib process event mechanism. |
| |
| When running in vlib process context, one must pay strict attention to |
| loop invariant issues. If one walks a data structure and calls a |
| function which may suspend, one had best know by construction that it |
| cannot change. Often, it’s best to simply make a snapshot copy of a data |
| structure, walk the copy at leisure, then free the copy. |
| |
| Process events |
| -------------- |
| |
| The vlib process event mechanism API is extremely lightweight and easy |
| to use. Here is a typical example: |
| |
| .. code:: c |
| |
| vlib_main_t *vm = &vlib_global_main; |
| uword event_type, * event_data = 0; |
| |
| while (1) |
| { |
| vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */); |
| |
| event_type = vlib_process_get_events (vm, &event_data); |
| |
| switch (event_type) { |
| case EVENT1: |
| handle_event1s (event_data); |
| break; |
| |
| case EVENT2: |
| handle_event2s (event_data); |
| break; |
| |
| case ~0: /* 5-second idle/periodic */ |
| handle_idle (); |
| break; |
| |
| default: /* bug! */ |
| ASSERT (0); |
| } |
| |
| vec_reset_length(event_data); |
| } |
| |
| In this example, the VLIB process node waits for an event to occur, or |
| for 5 seconds to elapse. The code demuxes on the event type, calling the |
| appropriate handler function. Each call to vlib_process_get_events |
| returns a vector of per-event-type data passed to successive |
| vlib_process_signal_event calls; it is a serious error to process only |
| event_data[0]. |
| |
| Resetting the event_data vector-length to 0 [instead of calling |
| vec_free] means that the event scheme doesn’t burn cycles continuously |
| allocating and freeing the event data vector. This is a common vppinfra |
| / vlib coding pattern, well worth using when appropriate. |
| |
| Signaling an event is easy, for example: |
| |
| .. code:: c |
| |
| vlib_process_signal_event (vm, process_node_index, EVENT1, |
| (uword)arbitrary_event1_data); /* and so forth */ |
| |
| One can either know the process node index by construction - dig it out |
| of the appropriate vlib_node_registration_t - or by finding the |
| vlib_node_t with vlib_get_node_by_name(…). |
| |
| Buffers |
| ------- |
| |
| vlib buffering solves the usual set of packet-processing problems, |
| albeit at high performance. Key in terms of performance: one ordinarily |
| allocates / frees N buffers at a time rather than one at a time. Except |
| when operating directly on a specific buffer, one deals with buffers by |
| index, not by pointer. |
| |
| Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays. |
| |
| Packets comprise one or more vlib buffers, chained together as required. |
| Multiple particle sizes are supported; hardware input nodes simply ask |
| for the required size(s). Coalescing support is available. For obvious |
| reasons one is discouraged from writing one’s own wild and wacky buffer |
| chain traversal code. |
| |
| vlib buffer headers are allocated immediately prior to the buffer data |
| area. In typical packet processing this saves a dependent read wait: |
| given a buffer’s address, one can prefetch the buffer header [metadata] |
| at the same time as the first cache line of buffer data. |
| |
| Buffer header metadata (vlib_buffer_t) includes the usual rewrite |
| expansion space, a current_data offset, RX and TX interface indices, |
| packet trace information, and a opaque areas. |
| |
| The opaque data is intended to control packet processing in arbitrary |
| subgraph-dependent ways. The programmer shoulders responsibility for |
| data lifetime analysis, type-checking, etc. |
| |
| Buffers have reference-counts in support of e.g. multicast replication. |
| |
| Shared-memory message API |
| ------------------------- |
| |
| Local control-plane and application processes interact with the vpp |
| dataplane via asynchronous message-passing in shared memory over |
| unidirectional queues. The same application APIs are available via |
| sockets. |
| |
| Capturing API traces and replaying them in a simulation environment |
| requires a disciplined approach to the problem. This seems like a |
| make-work task, but it is not. When something goes wrong in the |
| control-plane after 300,000 or 3,000,000 operations, high-speed replay |
| of the events leading up to the accident is a huge win. |
| |
| The shared-memory message API message allocator vl_api_msg_alloc uses a |
| particularly cute trick. Since messages are processed in order, we try |
| to allocate message buffering from a set of fixed-size, preallocated |
| rings. Each ring item has a “busy” bit. Freeing one of the preallocated |
| message buffers merely requires the message consumer to clear the busy |
| bit. No locking required. |
| |
| Debug CLI |
| --------- |
| |
| Adding debug CLI commands to VLIB applications is very simple. |
| |
| Here is a complete example: |
| |
| .. code:: c |
| |
| static clib_error_t * |
| show_ip_tuple_match (vlib_main_t * vm, |
| unformat_input_t * input, |
| vlib_cli_command_t * cmd) |
| { |
| vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main); |
| return 0; |
| } |
| |
| static VLIB_CLI_COMMAND (show_ip_tuple_command) = |
| { |
| .path = "show ip tuple match", |
| .short_help = "Show ip 5-tuple match-and-broadcast tables", |
| .function = show_ip_tuple_match, |
| }; |
| |
| This example implements the “show ip tuple match” debug cli command. In |
| ordinary usage, the vlib cli is available via the “vppctl” application, |
| which sends traffic to a named pipe. One can configure debug CLI telnet |
| access on a configurable port. |
| |
| The cli implementation has an output redirection facility which makes it |
| simple to deliver cli output via shared-memory API messaging, |
| |
| Particularly for debug or “show tech support” type commands, it would be |
| wasteful to write vlib application code to pack binary data, write more |
| code elsewhere to unpack the data and finally print the answer. If a |
| certain cli command has the potential to hurt packet processing |
| performance by running for too long, do the work incrementally in a |
| process node. The client can wait. |
| |
| Macro expansion |
| ~~~~~~~~~~~~~~~ |
| |
| The vpp debug CLI engine includes a recursive macro expander. This is |
| quite useful for factoring out address and/or interface name specifics: |
| |
| :: |
| |
| define ip1 192.168.1.1/24 |
| define ip2 192.168.2.1/24 |
| define iface1 GigabitEthernet3/0/0 |
| define iface2 loop1 |
| |
| set int ip address $iface1 $ip1 |
| set int ip address $iface2 $(ip2) |
| |
| undefine ip1 |
| undefine ip2 |
| undefine iface1 |
| undefine iface2 |
| |
| Each socket (or telnet) debug CLI session has its own macro tables. All |
| debug CLI sessions which use CLI_INBAND binary API messages share a |
| single table. |
| |
| The macro expander recognizes circular definitions: |
| |
| :: |
| |
| define foo \$(bar) |
| define bar \$(mumble) |
| define mumble \$(foo) |
| |
| At 8 levels of recursion, the macro expander throws up its hands and |
| replies “CIRCULAR.” |
| |
| Macro-related debug CLI commands |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| In addition to the “define” and “undefine” debug CLI commands, use “show |
| macro [noevaluate]” to dump the macro table. The “echo” debug CLI |
| command will evaluate and print its argument: |
| |
| :: |
| |
| vpp# define foo This\ Is\ Foo |
| vpp# echo $foo |
| This Is Foo |
| |
| Handing off buffers between threads |
| ----------------------------------- |
| |
| Vlib includes an easy-to-use mechanism for handing off buffers between |
| worker threads. A typical use-case: software ingress flow hashing. At a |
| high level, one creates a per-worker-thread queue which sends packets to |
| a specific graph node in the indicated worker thread. With the queue in |
| hand, enqueue packets to the worker thread of your choice. |
| |
| Initialize a handoff queue |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Simple enough, call vlib_frame_queue_main_init: |
| |
| .. code:: c |
| |
| main_ptr->frame_queue_index |
| = vlib_frame_queue_main_init (dest_node.index, frame_queue_size); |
| |
| Frame_queue_size means what it says: the number of frames which may be |
| queued. Since frames contain 1…256 packets, frame_queue_size should be a |
| reasonably small number (32…64). If the frame queue producer(s) are |
| faster than the frame queue consumer(s), congestion will occur. Suggest |
| letting the enqueue operator deal with queue congestion, as shown in the |
| enqueue example below. |
| |
| Under the floorboards, vlib_frame_queue_main_init creates an input queue |
| for each worker thread. |
| |
| Please do NOT create frame queues until it’s clear that they will be |
| used. Although the main dispatch loop is reasonably smart about how |
| often it polls the (entire set of) frame queues, polling unused frame |
| queues is a waste of clock cycles. |
| |
| Hand off packets |
| ~~~~~~~~~~~~~~~~ |
| |
| The actual handoff mechanics are simple, and integrate nicely with a |
| typical graph-node dispatch function: |
| |
| .. code:: c |
| |
| always_inline uword |
| do_handoff_inline (vlib_main_t * vm, |
| vlib_node_runtime_t * node, vlib_frame_t * frame, |
| int is_ip4, int is_trace) |
| { |
| u32 n_left_from, *from; |
| vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b; |
| u16 thread_indices [VLIB_FRAME_SIZE]; |
| u16 nexts[VLIB_FRAME_SIZE], *next; |
| u32 n_enq; |
| htest_main_t *hmp = &htest_main; |
| int i; |
| |
| from = vlib_frame_vector_args (frame); |
| n_left_from = frame->n_vectors; |
| |
| vlib_get_buffers (vm, from, bufs, n_left_from); |
| next = nexts; |
| b = bufs; |
| |
| /* |
| * Typical frame traversal loop, details vary with |
| * use case. Make sure to set thread_indices[i] with |
| * the desired destination thread index. You may |
| * or may not bother to set next[i]. |
| */ |
| |
| for (i = 0; i < frame->n_vectors; i++) |
| { |
| <snip> |
| /* Pick a thread to handle this packet */ |
| thread_indices[i] = f (packet_data_or_whatever); |
| <snip> |
| |
| b += 1; |
| next += 1; |
| n_left_from -= 1; |
| } |
| |
| /* Enqueue buffers to threads */ |
| n_enq = |
| vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index, |
| from, thread_indices, frame->n_vectors, |
| 1 /* drop on congestion */); |
| /* Typical counters, |
| if (n_enq < frame->n_vectors) |
| vlib_node_increment_counter (vm, node->node_index, |
| XXX_ERROR_CONGESTION_DROP, |
| frame->n_vectors - n_enq); |
| vlib_node_increment_counter (vm, node->node_index, |
| XXX_ERROR_HANDED_OFF, n_enq); |
| return frame->n_vectors; |
| } |
| |
| Notes about calling vlib_buffer_enqueue_to_thread(…): |
| |
| - If you pass “drop on congestion” non-zero, all packets in the inbound |
| frame will be consumed one way or the other. This is the recommended |
| setting. |
| |
| - In the drop-on-congestion case, please don’t try to “help” in the |
| enqueue node by freeing dropped packets, or by pushing them to |
| “error-drop.” Either of those actions would be a severe error. |
| |
| - It’s perfectly OK to enqueue packets to the current thread. |
| |
| Handoff Demo Plugin |
| ------------------- |
| |
| Check out the sample (plugin) example in …/src/examples/handoffdemo. If |
| you want to build the handoff demo plugin: |
| |
| :: |
| |
| $ cd .../src/plugins |
| $ ln -s ../examples/handoffdemo |
| |
| This plugin provides a simple example of how to hand off packets between |
| threads. We used it to debug packet-tracer handoff tracing support. |
| |
| Packet generator input script |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| :: |
| |
| packet-generator new { |
| name x |
| limit 5 |
| size 128-128 |
| interface local0 |
| node handoffdemo-1 |
| data { |
| incrementing 30 |
| } |
| } |
| |
| Start vpp with 2 worker threads |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The demo plugin hands packets from worker 1 to worker 2. |
| |
| Enable tracing, and start the packet generator |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| :: |
| |
| trace add pg-input 100 |
| packet-generator enable |
| |
| Sample Run |
| ~~~~~~~~~~ |
| |
| :: |
| |
| DBGvpp# ex /tmp/pg_input_script |
| DBGvpp# pa en |
| DBGvpp# sh err |
| Count Node Reason |
| 5 handoffdemo-1 packets handed off processed |
| 5 handoffdemo-2 completed packets |
| DBGvpp# show run |
| Thread 1 vpp_wk_0 (lcore 0) |
| Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00 |
| vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 |
| Name State Calls Vectors Suspends Clocks Vectors/Call |
| handoffdemo-1 active 1 5 0 4.76e3 5.00 |
| pg-input disabled 2 5 0 5.58e4 2.50 |
| unix-epoll-input polling 22760 0 0 2.14e7 0.00 |
| --------------- |
| Thread 2 vpp_wk_1 (lcore 2) |
| Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00 |
| vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0 |
| Name State Calls Vectors Suspends Clocks Vectors/Call |
| drop active 1 5 0 1.35e4 5.00 |
| error-drop active 1 5 0 2.52e4 5.00 |
| handoffdemo-2 active 1 5 0 2.56e4 5.00 |
| unix-epoll-input polling 22406 0 0 2.18e7 0.00 |
| |
| Enable the packet tracer and run it again… |
| |
| :: |
| |
| DBGvpp# trace add pg-input 100 |
| DBGvpp# pa en |
| DBGvpp# sh trace |
| sh trace |
| ------------------- Start of thread 0 vpp_main ------------------- |
| No packets in trace buffer |
| ------------------- Start of thread 1 vpp_wk_0 ------------------- |
| Packet 1 |
| |
| 00:06:50:520688: pg-input |
| stream x, 128 bytes, 0 sw_if_index |
| current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000 |
| 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000 |
| 00000020: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000040: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000060: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00:06:50:520762: handoffdemo-1 |
| HANDOFFDEMO: current thread 1 |
| |
| Packet 2 |
| |
| 00:06:50:520688: pg-input |
| stream x, 128 bytes, 0 sw_if_index |
| current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001 |
| 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000 |
| 00000020: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000040: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000060: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00:06:50:520762: handoffdemo-1 |
| HANDOFFDEMO: current thread 1 |
| |
| Packet 3 |
| |
| 00:06:50:520688: pg-input |
| stream x, 128 bytes, 0 sw_if_index |
| current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002 |
| 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000 |
| 00000020: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000040: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000060: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00:06:50:520762: handoffdemo-1 |
| HANDOFFDEMO: current thread 1 |
| |
| Packet 4 |
| |
| 00:06:50:520688: pg-input |
| stream x, 128 bytes, 0 sw_if_index |
| current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003 |
| 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000 |
| 00000020: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000040: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000060: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00:06:50:520762: handoffdemo-1 |
| HANDOFFDEMO: current thread 1 |
| |
| Packet 5 |
| |
| 00:06:50:520688: pg-input |
| stream x, 128 bytes, 0 sw_if_index |
| current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004 |
| 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000 |
| 00000020: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000040: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00000060: 0000000000000000000000000000000000000000000000000000000000000000 |
| 00:06:50:520762: handoffdemo-1 |
| HANDOFFDEMO: current thread 1 |
| |
| ------------------- Start of thread 2 vpp_wk_1 ------------------- |
| Packet 1 |
| |
| 00:06:50:520796: handoff_trace |
| HANDED-OFF: from thread 1 trace index 0 |
| 00:06:50:520796: handoffdemo-2 |
| HANDOFFDEMO: current thread 2 |
| 00:06:50:520867: error-drop |
| rx:local0 |
| 00:06:50:520914: drop |
| handoffdemo-2: completed packets |
| |
| Packet 2 |
| |
| 00:06:50:520796: handoff_trace |
| HANDED-OFF: from thread 1 trace index 1 |
| 00:06:50:520796: handoffdemo-2 |
| HANDOFFDEMO: current thread 2 |
| 00:06:50:520867: error-drop |
| rx:local0 |
| 00:06:50:520914: drop |
| handoffdemo-2: completed packets |
| |
| Packet 3 |
| |
| 00:06:50:520796: handoff_trace |
| HANDED-OFF: from thread 1 trace index 2 |
| 00:06:50:520796: handoffdemo-2 |
| HANDOFFDEMO: current thread 2 |
| 00:06:50:520867: error-drop |
| rx:local0 |
| 00:06:50:520914: drop |
| handoffdemo-2: completed packets |
| |
| Packet 4 |
| |
| 00:06:50:520796: handoff_trace |
| HANDED-OFF: from thread 1 trace index 3 |
| 00:06:50:520796: handoffdemo-2 |
| HANDOFFDEMO: current thread 2 |
| 00:06:50:520867: error-drop |
| rx:local0 |
| 00:06:50:520914: drop |
| handoffdemo-2: completed packets |
| |
| Packet 5 |
| |
| 00:06:50:520796: handoff_trace |
| HANDED-OFF: from thread 1 trace index 4 |
| 00:06:50:520796: handoffdemo-2 |
| HANDOFFDEMO: current thread 2 |
| 00:06:50:520867: error-drop |
| rx:local0 |
| 00:06:50:520914: drop |
| handoffdemo-2: completed packets |
| DBGvpp# |