blob: 313a86c3af46ad5b6d314756da3431290976ebed [file] [log] [blame]
jdenisco0923a232018-08-29 13:19:43 -04001.. _routes:
2
3Routes
4^^^^^^
5
Neale Rannsdfd39542020-11-09 10:09:42 +00006Basics
7------
jdenisco0923a232018-08-29 13:19:43 -04008
Neale Rannsdfd39542020-11-09 10:09:42 +00009The anatomy of a route is crucial to understand:
jdenisco0923a232018-08-29 13:19:43 -040010
Neale Rannsdfd39542020-11-09 10:09:42 +000011.. code-block:: console
12
13 1.1.1.0/24 via 10.0.0.1 eth0
14
15A route is composed of two parts; **what** to match against and **how** to forward
16the matched packets. In the above example we want to match packets
17whose destination IP address is in the 1.1.1.0/24 subnet and then we
18want to forward those packet to 10.0.0.1 on interface eth0. We
19therefore want to match the **prefix** 1.1.1.0/24 and forward on the
20**path** to 10.0.0.1, eth0.
21
22Matching on a prefix is the particular task of the IP FIB, matching on
23other packet attributes is done by other subsystems, e.g. matching on
24MPLS labels in the MPLS-FIB, or matching on a tuple in ACL based
25forwarding (ABF), 'matching' on all packets that arrive on an L3
26interface (l3XC). Although these subsystems match on different
27properties, they share the infrastructure on **how** to forward
28matched packets, that is they share the **paths**. The FIB paths (or
29really the path-list) thus provide services to clients, this service
30is to **contribute** forwarding, this, in terms that will be made
31clear in later sections, is to provide the DPO to use.
32
33The prime function of the FIB is to *resolve* the paths for a
34route. To resolve a route is to construct an object graph that fully
35describes how to forward matching packets. This means that the graph
36must terminate with an object (the leaf node) that describes how
37to send a packet on an interface [#f1]_, i.e what encap to add to the
38packet and what interface to send it to; this is the purpose of the IP
39adjacency object. In Figure 3 the route is resolved as the graph is
40complete from *fib_entry_t* to *ip_adjacency_t*.
41
42
43Thread Model
44^^^^^^^^^^^^
45
46The FIB is not thread safe. All actions on the FIB are expected to
47occur exclusively in the main thread. However, the data-structures
48that FIB updates to add routes are thread safe,
49w.r.t. addition/deletion and read, therefore routes can be added
50without holding the worker thread barrier lock.
51
52
53Tables
54------
55
56An IP FIB is a set of prefixes against which to match; it is
57sub-address family (SAFI) specific (i.e. there is one for ipv4 and ipv6, unicast
58and multicast). An IP Table is address family (AFI) specific (i.e. the
59'table' includes the unicast and multicast FIB).
60
61Each FIB is identified by the SAFI and instance number (the [pool]
62index), each table is identified by the AFI and ID. The table's ID is
63assigned by the user when the table is constructed. Table ID 0 is
64reserved for the global/default table.
65
66In most routing models a VRF is composed of an IPv4 and IPv6 table,
67however, VPP has no construct to model this association, it deals only
68with tables and FIBs.
69
70A unicast FIB is comprised of two route data-bases; forwarding and non-forwarding. The
jdenisco0923a232018-08-29 13:19:43 -040071forwarding data-base contains routes against which a packet will perform a longest
72prefix match (LPM) in the data-plane. The non-forwarding DB contains all the routes
Neale Rannsdfd39542020-11-09 10:09:42 +000073with which VPP has been programmed. Some of these routes may be
74unresolved, preventing their insertion into the forwarding DB.
75(see section: Adjacency source FIB entries).
76
77Model
78-----
jdenisco0923a232018-08-29 13:19:43 -040079
80The route data is decomposed into three parts; entry, path-list and paths;
81
Neale Rannsdfd39542020-11-09 10:09:42 +000082* The *fib_entry_t*, which contains the route's prefix, is the representation of that prefix's entry in the FIB table.
83* The *fib_path_t* is a description of where to send the packets destined to the route's prefix. There are several types of path, including:
jdenisco0923a232018-08-29 13:19:43 -040084
85 * Attached next-hop: the path is described with an interface and a next-hop. The next-hop is in the same sub-net as the router's own address on that interface, hence the peer is considered to be *attached*
86
Neale Rannsdfd39542020-11-09 10:09:42 +000087 * Attached: the path is described only by an interface. An
88 attached path means that all addresses covered by the route's
89 prefix are on the same L2 segment to which that router's
90 interface is attached. This means it is possible to ARP for any
91 address covered by the route's prefix. If this is not the case
92 then another device in that L2 segment needs to run proxy
93 ARP. An attached path is really only appropriate for a point-to-point
94 (P2P) interface where ARP is not required, i.e. a GRE tunnel. On
95 a p2p interface, attached and attached-nexthop paths will
96 resolve via a special 'auto-adjacency'. This is an adjacency
97 whose next-hop is the all zeros address and describes the only
98 peer on the link.
jdenisco0923a232018-08-29 13:19:43 -040099
100 * Recursive: The path is described only via the next-hop and table-id.
101
Neale Rannsdfd39542020-11-09 10:09:42 +0000102 * De-aggregate: The path is described only via the special all
103 zeros address and a table-id. This implies a subsequent lookup
104 in the table should be performed.
jdenisco0923a232018-08-29 13:19:43 -0400105
Neale Rannsdfd39542020-11-09 10:09:42 +0000106 * There are other path types, please consult the code.
107
108* The *fib_path_list_t* represents the list of paths from which to choose when forwarding. A path-list is a shared object, i.e. it is the parent to multiple fib_entry_t children. In order to share any object type it is necessary for a child to search for an existing object matching its requirements. For this there must be a database. The key to the path-list database is a combined description of all of the paths it contains [#f2]_. Searching the path-list database is required with each route addition, so it is populated only with path-lists for which sharing will bring convergence benefits (see Section: :ref:`fastconvergence`).
jdenisco0923a232018-08-29 13:19:43 -0400109
110.. figure:: /_images/fib20fig2.png
111
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -0400112Figure 2: Route data model class diagram
jdenisco0923a232018-08-29 13:19:43 -0400113
114Figure 2 shows an example of a route with two attached-next-hop paths. Each of these
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -0400115paths will *resolve* by finding the adjacency that matches the paths attributes, which
Neale Rannsdfd39542020-11-09 10:09:42 +0000116are the same as the key for the adjacency database [#f3]_. The *forwarding information (FI)*
jdenisco0923a232018-08-29 13:19:43 -0400117is the set of adjacencies that are available for load-balancing the traffic in the
118data-plane. A path *contributes* an adjacency to the route's forwarding information, the
119path-list contributes the full forwarding information for IP packets.
120
121.. figure:: /_images/fib20fig3.png
122
123Figure 3: Route object diagram
124
125Figure 3 shows the object instances and their relationships created in order to resolve
126the routes also shown. The graph nature of these relationships is evident; children
127are displayed at the top of the diagram, their parents below them. Forward walks are
128thus from top to bottom, back walks bottom to top. The diagram shows the objects
129that are shared, the path-list and adjacency. Sharing objects is critical to fast
130convergence (see section :ref:`fastconvergence`).
131
132FIB sources
133"""""""""""
134There are various entities in the system that can add routes to the FIB tables.
Neale Rannsdfd39542020-11-09 10:09:42 +0000135Each of these entities is termed a *source*. When the same prefix is added by different
jdenisco0923a232018-08-29 13:19:43 -0400136sources the FIB must arbitrate between them to determine which source will contribute
137the forwarding information. Since each source determines the forwarding information
138using different best path and loop prevention algorithms, it is not correct for the
139forwarding information of multiple sources to be combined. Instead the FIB must choose
140to use the forwarding information from only one source. This choice is based on a static
141priority assignment [#f4]_. The FIB must maintain the information each source has added
142so it can be restored should that source become the best source. VPP has two
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -0400143*control-plane* sources; the API and the CLI the API has the higher priority.
jdenisco0923a232018-08-29 13:19:43 -0400144Each *source* data is represented by a *fib_entry_src_t* object of which a
Neale Rannsdfd39542020-11-09 10:09:42 +0000145*fib_entry_t* maintains a sorted vector.
jdenisco0923a232018-08-29 13:19:43 -0400146
147The following configuration:
148
149.. code-block:: console
150
Neale Rannsdfd39542020-11-09 10:09:42 +0000151 $ set interface ip address GigabitEthernet0/8/0 192.168.1.1/24
jdenisco0923a232018-08-29 13:19:43 -0400152
153results in the addition of two FIB entries; 192.168.1.0/24 which is connected and
Neale Rannsdfd39542020-11-09 10:09:42 +0000154attached, and 192.168.1.1/32 which is connected and local (a.k.a.
155receive or for-us). A prefix is *connected* when it is applied to a router's interface.
jdenisco0923a232018-08-29 13:19:43 -0400156Both prefixes are *interface* sourced. The interface source has a high priority, so
157the accidental or nefarious addition of identical prefixes does not prevent the
158router from correctly forwarding. Packets matching a connected prefix will
Paul Vinciguerra7fa3dd22019-10-27 17:28:10 -0400159generate an ARP request for the packets destination address, this process is known
jdenisco0923a232018-08-29 13:19:43 -0400160as a *glean*.
161
162An *attached* prefix also results in a glean, but the router does not have its own
163address in that sub-net. The following configuration will result in an attached
164route, which resolves via an attached path;
165
166.. code-block:: console
167
168 $ ip route add table X 10.10.10.0/24 via gre0
169
Neale Rannsdfd39542020-11-09 10:09:42 +0000170as mentioned before, these are only appropriate for point-to-point
171links.
172
173If table X is not the table to which gre0 is bound,
jdenisco0923a232018-08-29 13:19:43 -0400174then this is the case of an attached export (see the section :ref:`attachedexport`).
175
176Adjacency source FIB entries
177""""""""""""""""""""""""""""
178
179Whenever an ARP entry is created it will source a *fib_entry_t*. In this case the
180route is of the form:
181
182.. code-block:: console
183
184 $ ip route add table X 10.0.0.1/32 via 10.0.0.1 GigabitEthernet0/8/0
185
Neale Rannsdfd39542020-11-09 10:09:42 +0000186This is a host prefix with a path whose next-hop address is the same host. This route
jdenisco0923a232018-08-29 13:19:43 -0400187highlights the distinction between the route's prefix - a description of the traffic
188to match - and the path - a description of where to send the matched traffic.
189Table X is the same table to which the interface is bound. FIB entries that are
190sourced by adjacencies are termed *adj-fibs*. The priority of the adjacency source
191is lower than the API source, so the following configuration:
192
193.. code-block:: console
194
195 $ set interface address 192.168.1.1/24 GigabitEthernet0/8/0
196 $ ip arp 192.168.1.2 GigabitEthernet0/8/0 dead.dead.dead
197 $ ip route add 192.168.1.2 via 10.10.10.10 GigabitEthernet1/8/0
198
199will forward traffic for 192.168.1.2 via GigabitEthernet1/8/0. That is the route added by the control
200plane is favoured over the adjacency discovered by ARP. The control plane, with its
201associated authentication, is considered the authoritative source. To counter the
202nefarious addition of adj-fibs, through the nefarious injection of adjacencies, the
203FIB is also required to ensure that only adj-fibs whose less specific covering prefix
204is attached are installed in forwarding. This requires the use of *cover tracking*,
205where a route maintains a dependency relationship with the route that is its less
206specific cover. When this cover changes (i.e. there is a new covering route) or the
207forwarding information of the cover is updated, then the covered route is notified.
Paul Vinciguerra340c15c2019-11-05 15:34:36 -0500208Adj-fibs that fail this cover check are not installed in the fib_table_t's forwarding
Neale Rannsdfd39542020-11-09 10:09:42 +0000209table, they are only present in the non-forwarding table.
jdenisco0923a232018-08-29 13:19:43 -0400210
211Overlapping sub-nets are not supported, so no adj-fib has multiple paths. The control
212plane is expected to remove a prefix configured for an interface before the interface
Neale Rannsdfd39542020-11-09 10:09:42 +0000213changes VRF.
jdenisco0923a232018-08-29 13:19:43 -0400214
215Recursive Routes
216""""""""""""""""
217
218Figure 4 shows the data structures used to describe a recursive route. The
219representation is almost identical to attached next-hop paths. The difference
220being that the *fib_path_t* has a parent that is another *fib_entry_t*, termed the
221*via-entry*
222
223.. figure:: /_images/fib20fig4.png
224
225Figure 4: Recursive route class diagram.
226
227In order to forward traffic to 64.10.128.0/20 the FIB must first determine how to forward
228traffic to 1.1.1.1/32. This is recursive resolution. Recursive resolution, which is
229essentially a cache of the data-plane result, emulates a longest prefix match for the
230*via-address" 1.1.1.1 in the *via-table* table 0 [#f5]_.
231
232Recursive resolution (RR) will source a host-prefix entry in the via-table for the
233via-address. The RR source is a low priority source. In the unlikely [#f6]_ event that the
234RR source is the best source, then it must derive forwarding information from its
235covering prefix.
236
237There are two cases to consider:
238
239* The cover is connected [#f7]_. The via-address is then an attached host and the RR source can resolve directly via the adjacency with the key {via-address, interface-of-connected-cover}
240* The cover is not connected [#f8]_. The RR source can directly inherit the forwarding information from its cover.
241
242This dependency on the covering prefix means the RR source will track its cover The
243covering prefix will *change* when;
244
245* A more specific prefix is inserted. For this reason whenever an entry is inserted into a FIB table its cover must be found so that its covered dependents can be informed.
246* The existing cover is removed. The covered prefixes must form a new relationship with the next less specific.
247
248The cover will be *updated* when the route for the covering prefix is modified. The
249cover tracking mechanism will provide the RR sourced entry with a notification in the
250event of a change or update of the cover, and the source can take the necessary action.
251
252The RR sourced FIB entry becomes the parent of the *fib_path_t* and will contribute its
253forwarding information to that path, so that the child's FIB entry can construct its own
254forwarding information.
255
256Figure 5 shows the object instances created to represent the recursive route and
257its resolving route also shown.
258
259.. figure:: /_images/fib20fig5.png
260
261Figure 5: Recursive Routes object diagram
262
263If the source adding recursive routes does not itself perform recursive resolution [#f9]_
264then it is possible that the source may inadvertently programme a recursion loop.
265
266An example of a recursion loop is the following configuration:
267
268.. code-block:: console
269
270 $ ip route add 5.5.5.5/32 via 6.6.6.6
271 $ ip route add 6.6.6.6/32 via 7.7.7.7
272 $ ip route add 7.7.7.7/32 via 5.5.5.5
273
274This shows a loop over three levels, but any number is possible. FIB will detect
275recursion loops by forward walking the graph when a *fib_entry_t* forms a child-parent
276relationship with a *fib_path_list_t*. The walk checks to see if the same object instances
277are encountered. When a recursion loop is formed the control plane [#f10]_ graph becomes
278cyclic, thus allowing the child-parent dependencies to form. This is necessary so that
279when the loop breaks, the affected children and be updated.
280
281Output labels
282"""""""""""""
283
Neale Rannsdfd39542020-11-09 10:09:42 +0000284A route may have associated output MPLS labels [#f11]_. These are labels that are expected
jdenisco0923a232018-08-29 13:19:43 -0400285to be imposed on a packet as it is forwarded. It is important to note that an MPLS
Neale Rannsdfd39542020-11-09 10:09:42 +0000286label is per-route and per-path, therefore, even though routes share paths they do not
jdenisco0923a232018-08-29 13:19:43 -0400287necessarily have the same label for that path [#f12]_. A label is therefore uniquely associated
288to a *fib_entry_t* and associated with one of the *fib_path_t* to which it forwards.
Neale Rannsdfd39542020-11-09 10:09:42 +0000289MPLS labels are modelled via the generic concept of a *path-extension*. A *fib_entry_t*
290therefore has a vector of zero to many *fib_path_ext_t* objects to represent the labels
jdenisco0923a232018-08-29 13:19:43 -0400291with which it is configured.
292
Neale Rannsdfd39542020-11-09 10:09:42 +0000293
294Delegates
295^^^^^^^^^
296
297A common software development pattern, a delegate is a means to
298extend the functionality of one object through composition of
299another, these other objects are called delegates. Both
300**fib_entry_t** and **ip_adjacency_t** support extension via delegates.
301
302The FIB uses delegates to add functionality when those functions are
303required by only a few objects instances rather than all of them, to
304save on memory. For example, building/contributing a load-balance
305object used to forward non-EOS MPLS traffic is only required for a
306fib_entry_t that corresponds to a BGP peer and that peer is
307advertising labeled route - there are only a few of
308these. See **fib_entry_delegate.h** for a full list of delegate types.
309
310
311Tracking
312^^^^^^^^
313
314A prime service FIB provides for other sub-system is the ability to
315'track' the forwarding for a given next-hop. For example, a tunnel
316will want to know how to forward to its destination address. It can
317therefore request of the FIB to track this host-prefix and inform it
318when the forwarding for that prefix changes.
319
320FIB tracking sources a host-prefix entry in the FIB using the 'recusive
321resolution (RR)' source, it exactly the same way that a recursive path
322does. If the entry did not previsouly exist, then the RR source will
323inherit (and track) forwarding from its covering prefix, therefore all
324packets that match this entry are forwarded in the same way as if the
325entry did not exist. The tunnel that is tracking this FIB entry will
326become a child dependent. The benefit to creating the entry, is that
327it now exists in the FIB node graph, so all actions that happen on its
328parents, are propagated to the host-prefix entry and consequently to
329the tunnel.
330
331FIB provides a wrapper to the sourcing of the host-prefix using a
332delegate attached to the entry, and the entry is RR sourced only once.
333. The benefit of this aproach is that each time a new client tracks
334the entry it doesn't RR source it. When an entry is sourced all its
335children are updated. Thus, new clients tracking an entry is
336O(n^2). With the tracker as indirection, the entry is sourced only once.
337
338
jdenisco0923a232018-08-29 13:19:43 -0400339.. rubric:: Footnotes:
340
Neale Rannsdfd39542020-11-09 10:09:42 +0000341.. [#f1] Or terminate in an object that transitions the packet out of
342 the FIB domain, e.g. a drop.
jdenisco0923a232018-08-29 13:19:43 -0400343.. [#f2] Optimisations
344.. [#f3] Note it is valid for either interface to be bound to a different table than table 1
345.. [#f4] The engaged reader can see the full priority list in vnet/vnet/fib/fib_entry.h
346.. [#f5] Note it is only possible to add routes via an address (i.e. a/32 or /128) not via a shorter mask prefix. There is no use case for the latter
347.. [#f6] For iBGP the via-address is the loopback address of the peer PE, for eBGP it is the adj-fib for the CE
348.. [#f7] As is the case ofr eBGP
349.. [#f8] As is the case for iBGP
350.. [#f9] If that source is relying on FIB to perform recursive resolution, then there is no reason it should do so itself.
351.. [#f10] The derived data-plane graph MUST never be cyclic
352.. [#f11] Advertised, e.g. by LDP, SR or BGP
353.. [#f12] The only case where the labels will be the same is BGP VPNv4 label allocation per-VRF