jdenisco | 0923a23 | 2018-08-29 13:19:43 -0400 | [diff] [blame] | 1 | .. _dataplane: |
| 2 | |
| 3 | The Data Plane |
| 4 | --------------- |
| 5 | |
| 6 | The data-plane data model is a directed, acyclic [#f16]_ graph of heterogeneous objects. |
| 7 | A packet will forward walk the graph as it is switched. Each object describes |
| 8 | the actions to perform on the packet. Each object type has an associated VLIB |
| 9 | graph node. For a packet to forward walk the graph is therefore to move from one |
| 10 | VLIB node to the next, with each performing the required actions. This is the |
| 11 | heart of the VPP model. |
| 12 | |
| 13 | The data-plane graph is composed of generic data-path objects (DPOs). A parent |
| 14 | DPO is identified by the tuple:{type,index,next_node}. The *next_node* parameter |
| 15 | is the index of the VLIB node to which the packets should be sent next, this is |
| 16 | present to maximise performance - it is important to ensure that the parent does |
| 17 | not need to be read [#f17]_ whilst processing the child. Specialisations [#f18]_ of the DPO |
| 18 | perform distinct actions. The most common DPOs and briefly what they represent are: |
| 19 | |
| 20 | - Load-balance: a choice in an ECMP set. |
| 21 | - Adjacency: apply a rewrite and forward through an interface |
| 22 | - MPLS-label: impose an MPLS label. |
| 23 | - Lookup: perform another lookup in a different table. |
| 24 | |
| 25 | The data-plane graph is derived from the control-plane graph by the objects |
| 26 | therein 'contributing' a DPO to the data-plane graph. Objects in the data-plane |
| 27 | contain only the information needed to switch a packet, they are therefore |
| 28 | simpler, and in memory terms smaller, with the aim to fit one DPO on a single |
| 29 | cache-line. The derivation from the control plane means that the data-plane |
| 30 | graph contains only object whose current state can forward packets. For example, |
| 31 | the difference between a *fib_path_list_t* and a *load_balance_t* is that the former |
| 32 | expresses the control-plane's desired state, the latter the data-plane available |
| 33 | state. If some paths in the path-list are unresolved or down, then the |
| 34 | load-balance will not include them in the forwarding choice. |
| 35 | |
| 36 | .. figure:: /_images/fib20fig8.png |
| 37 | |
| 38 | Figure 8: DPO contributions for a non-recursive route |
| 39 | |
| 40 | Figure 8 shows a simplified view of the control-plane graph indicating those |
| 41 | objects that contribute DPOs. Also shown are the VLIB node graphs at which the DPO is used. |
| 42 | |
| 43 | Each *fib_entry_t* contributes it own *load_balance_t*, for three reasons; |
| 44 | |
| 45 | - The result of a lookup in a IPv[46] table is a single 32 bit unsigned integer. This is an index into a memory pool. Consequently the object type must be the same for each result. Some routes will need a load-balance and some will not, but to insert another object in the graph to represent this choice is a waste of cycles, so the load-balance object is always the result. If the route does not have ECMP, then the load-balance has only one choice. |
| 46 | |
| 47 | - In order to collect per-route counters, the lookup result must in some way uniquely identify the *fib_entry_t*. A shared load-balance (contributed by the path-list) would not allow this. |
| 48 | - In the case the *fib_entry_t* has MPLS out labels, and hence a *fib_path_ext_t*, then the load-balance must be per-prefix, since the MPLS labels that are its parents are themselves per-fib_entry_t. |
| 49 | |
| 50 | .. figure:: /_images/fib20fig9.png |
| 51 | |
| 52 | Figure 9: DPO contribution for a recursive route. |
| 53 | |
| 54 | Figure 9 shows the load-balance objects contributed for a recursive route. |
| 55 | |
| 56 | .. figure:: /_images/fib20fig10.png |
| 57 | |
| 58 | Figure 10: DPO Contributions from labelled recursive routes. |
| 59 | |
| 60 | Figure 10 shows the derived data-plane graph for a labelled recursive route. |
| 61 | There can be as many MPLS-label DPO instances as there are routes multiplied by |
| 62 | the number of paths per-route. For this reason the mpls-label DPO should be as |
| 63 | small as possible [#f19]_. |
| 64 | |
| 65 | The data-plane graph is constructed by 'stacking' one |
| 66 | instance of a DPO on another to form the child-parent relationship. When this |
| 67 | stacking occurs, the necessary VLIB graph arcs are automatically constructed |
| 68 | from the respected DPO type's registered graph nodes. |
| 69 | |
| 70 | The diagrams above show that for any given route the full data-plane graph is |
| 71 | known before anypacket arrives. If that graph is composed of n objects, then the |
| 72 | packet will visit n nodes and thus incur a forwarding cost of approximately n |
| 73 | times the graph node cost. This could be reduced if the graph were *collapsed* |
| 74 | into a single DPO and associated node. However, collapsing a graph removes the |
| 75 | indirection objects that provide fast convergence (see section Fast Convergence). To |
| 76 | collapse is then a trade-off between faster forwarding and fast convergence; VPP |
| 77 | favours the latter. |
| 78 | |
| 79 | This DPO model effectively exists today but is informally defined. Presently the |
| 80 | only object that is in the data-plane is the ip_adjacency_t, however, features |
| 81 | (like ILA, OAM hop-by-hop, SR, MAP, etc) sub-type the adjacency. The member |
| 82 | lookup_next_index is equivalent to defining a new sub-type. Adding to the |
| 83 | existing union, or casting sub-type specific data into the opaque member, or |
| 84 | even over the rewrite string (e.g. the new port range checker), is equivalent |
| 85 | defining a new C-struct type. Fortunately, at this time, all these sub-types are |
| 86 | smaller in memory than the ip_adjacency_t. It is now possible to dynamically |
| 87 | register new adjacency sub-types with ip_register_adjacency() and provide a |
| 88 | custom format function. |
| 89 | |
| 90 | In my opinion a strongly defined object model will be easier for contributors to |
| 91 | understand, and more robust to implement. |
| 92 | |
| 93 | .. rubric:: Footnotes: |
| 94 | |
| 95 | .. [#f16] Directed implies it cannot be back-walked. It is acyclic even in the presence of a recursion loop. |
| 96 | .. [#f17] Loaded into cache, and hence potentially incurring a d-cache miss. |
| 97 | .. [#f18] The engaged reader is directed to vnet/vnet/dpo/* |
| 98 | .. [#f19] i.e. we should not re-use the adjacency structure. |
| 99 | |