blob: 62fcc00bd78b9f03d586d05d017daa2ce7ebfa30 [file] [log] [blame]
jdenisco0923a232018-08-29 13:19:43 -04001.. _routes:
2
3Routes
4^^^^^^
5
6The control plane will install a route in a table for a prefix via a list of paths.
7The prime function of the FIB is to *resolve* that route. To resolve a route is to
8construct an object graph that fully describes all elements of the route. In Figure 3
9the route is resolved as the graph is complete from *fib_entry_t* to *ip_adjacency_t*.
10
11In some routing models a VRF will consist of a set of tables for IPv4 and IPv6, and
12unicast and multicast. In VPP there is no such grouping. Each table is distinct from
13each other. A table is indentified by its numerical ID. The ID range is separate for
14each address family.
15
16A table is comprised of two route data-bases; forwarding and non-forwarding. The
17forwarding data-base contains routes against which a packet will perform a longest
18prefix match (LPM) in the data-plane. The non-forwarding DB contains all the routes
19with which VPP has been programmed Рsome of these routes may be unresolved for reasons
20that prevent their insertion into the forwarding DB
21(see section: Adjacency source FIB entries).
22
23The route data is decomposed into three parts; entry, path-list and paths;
24
25* The *fib_entry_t*, which contains the routeճ prefix, is representation of that prefix's entry in the FIB table.
26* The *fib_path_t* is a description of where to send the packets destined to the route's prefix. There are several types of path.
27
28 * Attached next-hop: the path is described with an interface and a next-hop. The next-hop is in the same sub-net as the router's own address on that interface, hence the peer is considered to be *attached*
29
30 * Attached: the path is described only by an interface. All address covered by the prefix are on the same L2 segment to which that router's interface is attached. This means it is possible to ARP for any address covered by the prefix Рwhich is usually not the case (hence the proxy ARP debacle in IOS). An attached path is only appropriate for a point-to-point (P2P) interface where ARP is not required, i.e. a GRE tunnel.
31
32 * Recursive: The path is described only via the next-hop and table-id.
33
34 * De-aggregate: The path is described only via the special all zeros address and a table-id. This implies a subsequent lookup in the table should be performed.
35
36* The *fib_path_list_t* represents the list of paths from which to choose one when forwarding. The path-list is a shared object, i.e. it is the parent to multiple fib_entry_t children. In order to share any object type it is necessary for a child to search for an existing object matching its requirements. For this there must be a data-base. The key to the path-list data-base is a combined description of all of the paths it contains [#f2]_. Searching the path-list database is required with each route addition, so it is populated only with path-lists for which sharing will bring convergence benefits (see Section: :ref:`fastconvergence`).
37
38.. figure:: /_images/fib20fig2.png
39
40Figure 2: Route data model Рclass diagram
41
42Figure 2 shows an example of a route with two attached-next-hop paths. Each of these
43paths will *resolve* by finding the adjacency that matches the pathճ attributes, which
44are the same as the key for the adjacency data-base [#f3]_. The *forwarding information (FI)*
45is the set of adjacencies that are available for load-balancing the traffic in the
46data-plane. A path *contributes* an adjacency to the route's forwarding information, the
47path-list contributes the full forwarding information for IP packets.
48
49.. figure:: /_images/fib20fig3.png
50
51Figure 3: Route object diagram
52
53Figure 3 shows the object instances and their relationships created in order to resolve
54the routes also shown. The graph nature of these relationships is evident; children
55are displayed at the top of the diagram, their parents below them. Forward walks are
56thus from top to bottom, back walks bottom to top. The diagram shows the objects
57that are shared, the path-list and adjacency. Sharing objects is critical to fast
58convergence (see section :ref:`fastconvergence`).
59
60FIB sources
61"""""""""""
62There are various entities in the system that can add routes to the FIB tables.
63Each of these entities is termed a *source* When the same prefix is added by different
64sources the FIB must arbitrate between them to determine which source will contribute
65the forwarding information. Since each source determines the forwarding information
66using different best path and loop prevention algorithms, it is not correct for the
67forwarding information of multiple sources to be combined. Instead the FIB must choose
68to use the forwarding information from only one source. This choice is based on a static
69priority assignment [#f4]_. The FIB must maintain the information each source has added
70so it can be restored should that source become the best source. VPP has two
71*control-plane* sources; the API and the CLI Рthe API has the higher priority.
72Each *source* data is represented by a *fib_entry_src_t* object of which a
73*fib_entry_t* maintains a sorted vector.n A prefix is *connected* when it is
74applied to a routerճ interface.
75
76The following configuration:
77
78.. code-block:: console
79
80 $ set interface address 192.168.1.1/24 GigabitEthernet0/8/0
81
82results in the addition of two FIB entries; 192.168.1.0/24 which is connected and
83attached, and 192.168.1.1/32 which is connected and local (a.k.a receive or for-us).
84Both prefixes are *interface* sourced. The interface source has a high priority, so
85the accidental or nefarious addition of identical prefixes does not prevent the
86router from correctly forwarding. Packets matching a connected prefix will
87generate an ARP request for the packetճ destination address, this process is known
88as a *glean*.
89
90An *attached* prefix also results in a glean, but the router does not have its own
91address in that sub-net. The following configuration will result in an attached
92route, which resolves via an attached path;
93
94.. code-block:: console
95
96 $ ip route add table X 10.10.10.0/24 via gre0
97
98as mentioned before, these are only appropriate for point-to-point links. An
99attached-host prefix is covered by either an attached prefix (note that connected
100prefixes are also attached). If table X is not the table to which gre0 is bound,
101then this is the case of an attached export (see the section :ref:`attachedexport`).
102
103Adjacency source FIB entries
104""""""""""""""""""""""""""""
105
106Whenever an ARP entry is created it will source a *fib_entry_t*. In this case the
107route is of the form:
108
109.. code-block:: console
110
111 $ ip route add table X 10.0.0.1/32 via 10.0.0.1 GigabitEthernet0/8/0
112
113It is a host prefix with a path whose next-hop address is the same. This route
114highlights the distinction between the route's prefix - a description of the traffic
115to match - and the path - a description of where to send the matched traffic.
116Table X is the same table to which the interface is bound. FIB entries that are
117sourced by adjacencies are termed *adj-fibs*. The priority of the adjacency source
118is lower than the API source, so the following configuration:
119
120.. code-block:: console
121
122 $ set interface address 192.168.1.1/24 GigabitEthernet0/8/0
123 $ ip arp 192.168.1.2 GigabitEthernet0/8/0 dead.dead.dead
124 $ ip route add 192.168.1.2 via 10.10.10.10 GigabitEthernet1/8/0
125
126will forward traffic for 192.168.1.2 via GigabitEthernet1/8/0. That is the route added by the control
127plane is favoured over the adjacency discovered by ARP. The control plane, with its
128associated authentication, is considered the authoritative source. To counter the
129nefarious addition of adj-fibs, through the nefarious injection of adjacencies, the
130FIB is also required to ensure that only adj-fibs whose less specific covering prefix
131is attached are installed in forwarding. This requires the use of *cover tracking*,
132where a route maintains a dependency relationship with the route that is its less
133specific cover. When this cover changes (i.e. there is a new covering route) or the
134forwarding information of the cover is updated, then the covered route is notified.
135Adj-fibs that fail this cover check are not installed in the fib_table_tճ forwarding
136table, there are only present in the non-forwarding table.
137
138Overlapping sub-nets are not supported, so no adj-fib has multiple paths. The control
139plane is expected to remove a prefix configured for an interface before the interface
140changes RF.
141
142So while the following configuration is accepted:
143
144.. code-block:: console
145
146 $ set interface address 192.168.1.1/32 GigabitEthernet0/8/0
147 $ ip arp 192.168.1.2 GigabitEthernet0/8/0 dead.dead.dead
148 $ set interface ip table GigabitEthernet0/8/0 2
149
150it does not result in the desired behaviour, where the adj-fib and connecteds are
151moved to table 2.
152
153Recursive Routes
154""""""""""""""""
155
156Figure 4 shows the data structures used to describe a recursive route. The
157representation is almost identical to attached next-hop paths. The difference
158being that the *fib_path_t* has a parent that is another *fib_entry_t*, termed the
159*via-entry*
160
161.. figure:: /_images/fib20fig4.png
162
163Figure 4: Recursive route class diagram.
164
165In order to forward traffic to 64.10.128.0/20 the FIB must first determine how to forward
166traffic to 1.1.1.1/32. This is recursive resolution. Recursive resolution, which is
167essentially a cache of the data-plane result, emulates a longest prefix match for the
168*via-address" 1.1.1.1 in the *via-table* table 0 [#f5]_.
169
170Recursive resolution (RR) will source a host-prefix entry in the via-table for the
171via-address. The RR source is a low priority source. In the unlikely [#f6]_ event that the
172RR source is the best source, then it must derive forwarding information from its
173covering prefix.
174
175There are two cases to consider:
176
177* The cover is connected [#f7]_. The via-address is then an attached host and the RR source can resolve directly via the adjacency with the key {via-address, interface-of-connected-cover}
178* The cover is not connected [#f8]_. The RR source can directly inherit the forwarding information from its cover.
179
180This dependency on the covering prefix means the RR source will track its cover The
181covering prefix will *change* when;
182
183* A more specific prefix is inserted. For this reason whenever an entry is inserted into a FIB table its cover must be found so that its covered dependents can be informed.
184* The existing cover is removed. The covered prefixes must form a new relationship with the next less specific.
185
186The cover will be *updated* when the route for the covering prefix is modified. The
187cover tracking mechanism will provide the RR sourced entry with a notification in the
188event of a change or update of the cover, and the source can take the necessary action.
189
190The RR sourced FIB entry becomes the parent of the *fib_path_t* and will contribute its
191forwarding information to that path, so that the child's FIB entry can construct its own
192forwarding information.
193
194Figure 5 shows the object instances created to represent the recursive route and
195its resolving route also shown.
196
197.. figure:: /_images/fib20fig5.png
198
199Figure 5: Recursive Routes object diagram
200
201If the source adding recursive routes does not itself perform recursive resolution [#f9]_
202then it is possible that the source may inadvertently programme a recursion loop.
203
204An example of a recursion loop is the following configuration:
205
206.. code-block:: console
207
208 $ ip route add 5.5.5.5/32 via 6.6.6.6
209 $ ip route add 6.6.6.6/32 via 7.7.7.7
210 $ ip route add 7.7.7.7/32 via 5.5.5.5
211
212This shows a loop over three levels, but any number is possible. FIB will detect
213recursion loops by forward walking the graph when a *fib_entry_t* forms a child-parent
214relationship with a *fib_path_list_t*. The walk checks to see if the same object instances
215are encountered. When a recursion loop is formed the control plane [#f10]_ graph becomes
216cyclic, thus allowing the child-parent dependencies to form. This is necessary so that
217when the loop breaks, the affected children and be updated.
218
219Output labels
220"""""""""""""
221
222A route may have associated out MPLS labels [#f11]_. These are labels that are expected
223to be imposed on a packet as it is forwarded. It is important to note that an MPLS
224label is per-route and per-path, therefore, even though routes share paths the do not
225necessarily have the same label for that path [#f12]_. A label is therefore uniquely associated
226to a *fib_entry_t* and associated with one of the *fib_path_t* to which it forwards.
227MPLS labels are modelled via the generic concept of a *path-extension* A *fib_entry_t*
228therefore has a vector of zero to many *fib_path_ext_t objects* to represent the labels
229with which it is configured.
230
231.. rubric:: Footnotes:
232
233.. [#f2] Optimisations
234.. [#f3] Note it is valid for either interface to be bound to a different table than table 1
235.. [#f4] The engaged reader can see the full priority list in vnet/vnet/fib/fib_entry.h
236.. [#f5] Note it is only possible to add routes via an address (i.e. a/32 or /128) not via a shorter mask prefix. There is no use case for the latter
237.. [#f6] For iBGP the via-address is the loopback address of the peer PE, for eBGP it is the adj-fib for the CE
238.. [#f7] As is the case ofr eBGP
239.. [#f8] As is the case for iBGP
240.. [#f9] If that source is relying on FIB to perform recursive resolution, then there is no reason it should do so itself.
241.. [#f10] The derived data-plane graph MUST never be cyclic
242.. [#f11] Advertised, e.g. by LDP, SR or BGP
243.. [#f12] The only case where the labels will be the same is BGP VPNv4 label allocation per-VRF