blob: b07e08cea6d95c808170ec0d828e74f0c9df78d4 [file] [log] [blame]
jdenisco0923a232018-08-29 13:19:43 -04001.. _fastconvergence:
2
3Fast Convergence
4------------------------------------
5
Neale Rannsdfd39542020-11-09 10:09:42 +00006This is an excellent description of the topic:
jdenisco0923a232018-08-29 13:19:43 -04007
Neale Rannsdfd39542020-11-09 10:09:42 +00008'FIB <https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-pic-12>'_
9
10but if you're interested in my take keep reading...
11
12First some definitions:
13
14- Convergence; When a FIB is forwarding all packets correctly based
15 on the network topology (i.e. doing what the routing control plane
16 has instructed it to do), then it is said to be 'converged'.
17 Not being in a converged state is [hopefully] a transient state,
18 when either the topology change (e.g. a link failure) has not been
19 observed or processed by the routing control plane, or that the FIB
20 is still processing routing updates. Convergence is the act of
21 getting to the converged state.
22- Fast: In the shortest time possible. There are no absolute limits
23 placed on how short this must be, although there is one number often
24 mentioned. Apparently the human ear can detect loss/delay/jitter in
25 VOIP of 50ms, therefore network failures should last no longer than
26 this, and some technologies (notably link-free alternate fast
27 reroute) are designed to converge in this time. However, it is
28 generally accepted that it is not possible to converge a FIB with
29 tens of millions of routes in this time scale, the industry
30 'standard' is sub-second.
31
32Converging the FIB quickly is thus a matter of:
33
34- discovering something is down
35- updating as few objects as possible
36- to determine which objects to update as efficiently as possible
37- to update each object as quickly as possible
38
39we'll discuss each in turn.
40All output came from VPP version 21.01rc0. In what follows I use IPv4
41prefixes, addresses and IPv4 host length masks, however, exactly the
42same applies to IPv6.
43
44
45Failure Detection
46^^^^^^^^^^^^^^^^^
47
48The two common forms (we'll see others later on) of failure detection
49are:
50
51- link down
52- BFD
53
54The FIB needs to hook into these notifications to trigger
55convergence.
56
57Whenever an interface goes down, VPP issues a callback to all
58registerd clients. The adjacency code is such a client. The adjacency
59is a leaf node in the FIB control-plane graph (containing fib_path_t,
60fib_entry_t etc). A back-walk from the adjacnecy will trigger a
61re-resolution of the paths.
62
63FIB is a client of BFD in order to receive BFD notifications. BFD
64comes in two flavours; single and multi hop. Single hop is to protect
65a specific peer on an interface, such peers are modelled by an
66adjacency. Multi hop is to protect a peer on an unspecified interface
67(i.e. a remote peer), this peer is represented by a host-prefix
68**fib_entry_t**. In both case FIB will add a delegate to the
69**ip_adjacency_t** or **fib_entry_t** that represents the association
70to the BFD session. If the BFD session signals up/down then a backwalk
71can be triggered from the object to trigger re-resolution and hence
72convergence.
73
74
75Few Updates
76^^^^^^^^^^^
77
78In order to talk about what 'a few' is we have to leave the realm of
79the FIB as an abstract graph based object DB and move into the
80concrete representation of forwarding in a large network. Large
81networks are built in layers, it's how you scale them. We'll take
82here a hypothetical service provider (SP) network, but the concepts
83apply equally to data center leaf-spines. This is a rudimentary
84description, but it should serve our purpose.
85
86An SP manages a BGP autonomous system (AS). The SP's goal is both to
87attract traffic into its network to serve its customers, but also to
88serve transit traffic passing through it, we'll consider the latter here.
89The SP's network is all devices in that AS, these
90devices are split into those at the edge (provider edge (PE) routers)
91which peer with routers in other SP networks,
92and those in the core (termed provider (P) routers). Both the PE and P
93routers run the IGP (usually OSPF or ISIS). Only the reachability of the devices
94in the AS are advertised in the IGP - thus the scale (i.e. the number
95of routes) in the IGP is 'small' - only the number of
96devices that the SP has (typically not more than a few 10k).
97PE routers run BGP; they have external BGP sessions to devices in
98other ASs and internal BGP sessions to devices in the same AS. BGP is
99used to advertise the routes to *all* networks on the internet - at
100the time of writing this number is approaching 900k IPv4 route, hopefully by
101the time you are reading this the number of IPv6 routes has caught up ...
102If we include the additional routes the SP carries to offering VPN service to its
103customers the number of BGP routes can grow to the tens of millions.
104
105BGP scale thus exceeds IGP scale by two orders of magnitude... pause for
106a moment and let that sink in...
107
108A comparison of BGP and an IGP is way way beyond the scope of this
109documentation (and frankly beyond me) so we'll note only the
110difference in the form of the routes they present to FIB. A routing
111protocol will produce routes that specify the prefixes that are
112reachable through its peers. A good IGP
113is link state based, it forms peerings to other devices over these
114links, hence its routes specify links/interfaces. In
115FIB nomenclature this means an IGP produces routes that are
116attached-nexthop, e.g.:
117
118.. code-block:: console
119
120 ip route add 1.1.1.1/32 via 10.0.0.1 GigEthernet0/0/0
121
122BGP on the other hand forms peerings only to neighbours, it does not
123know, nor care, what interface is used to reach the peer. In FIB
124nomenclature therefore BGP produces recursive routes, e.g.:
125
126.. code-block:: console
127
128 ip route 8.0.0.0/16 via 1.1.1.1
129
130where 1.1.1.1 is the BGP peer. It's no accident in this example that
1311.1.1.1/32 happens to be the route the IGP advertised... BGP installs
132routes for prefixes reachable via other BGP peers, and the IGP install
133the routes to those BGP peers.
134
135This has been a very long winded way of describing why the scale of
136recursive routes is therefore 2 orders of magnitude greater than
137non-recursive/attached-nexthop routes.
138
139If we step back for a moment and recall why we've crawled down this
140rabbit hole, we're trying to determine what 'a few' updates means,
141does it include all those recursive routes, probably not ... let's
142keep crawling.
143
144We started this chapter with an abstract description of convergence,
145let's now make that more real. In the event of a network failure an SP
146is interested in moving to an alternate forwarding path as quickly as
147possible. If there is no alternate path, and a converged FIB will drop
148the packet, then who cares how fast it converges. In other words the
149interesting convergence scenarios are the scenarios where the network has
150alternate paths.
151
152PIC Core
153^^^^^^^^
154
155First let's consider alternate paths in the IGP, e.g.;
156
157.. code-block:: console
158
159 ip route add 1.1.1.1/32 via 10.0.0.2 GigEthernet0/0/0
160 ip route add 1.1.1.1/32 via 10.0.1.2 GigEthernet0/0/1
161
162this gives us in the FIB:
163
164.. code-block:: console
165
166 DBGvpp# sh ip fib 1.1.1.1/32
167 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, default-route:1, ]
168 1.1.1.1/32 fib:0 index:15 locks:2
169 API refs:1 src-flags:added,contributing,active,
170 path-list:[23] locks:2 flags:shared, uPRF-list:22 len:2 itfs:[1, 2, ]
171 path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
172 10.0.0.2 GigEthernet0/0/0
173 [@0]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
174 path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
175 10.0.1.2 GigEthernet0/0/1
176 [@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
177
178 forwarding: unicast-ip4-chain
179 [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
180 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
181 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
182
183There is ECMP across the two paths. Note that the instance/index of the
184load-balance present in the forwarding graph is 17.
185
186Let's add a BGP route via this peer;
187
188.. code-block:: console
189
190 ip route add 8.0.0.0/16 via 1.1.1.1
191
192in the FIB we see:
193
194
195.. code-block:: console
196
197 DBGvpp# sh ip fib 8.0.0.0/16
198 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
199 8.0.0.0/16 fib:0 index:18 locks:2
200 API refs:1 src-flags:added,contributing,active,
201 path-list:[24] locks:2 flags:shared, uPRF-list:21 len:2 itfs:[1, 2, ]
202 path:[29] pl-index:24 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
203 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
204
205 forwarding: unicast-ip4-chain
206 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:21 to:[0:0]]
207 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
208 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
209 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
210
211the load-balance object used by this route is index 20, but note that
212the next load-balance in the chain is index 17, i.e. it is exactly
213the same instance that appears in the forwarding chain for the IGP
214route. So in the forwarding plane the packet first encounters
215load-balance object 20 (which it will use in ip4-lookup) and then
216number 17 (in ip4-load-balance).
217
218What's the significance? Let's shut down one of those IGP paths:
219
220.. code-block:: console
221
222 DBGvpp# set in state GigEthernet0/0/0 down
223
224the resulting update to the IGP route is:
225
226.. code-block:: console
227
228 DBGvpp# sh ip fib 1.1.1.1/32
229 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
230 1.1.1.1/32 fib:0 index:15 locks:4
231 API refs:1 src-flags:added,contributing,active,
232 path-list:[23] locks:2 flags:shared, uPRF-list:25 len:2 itfs:[1, 2, ]
233 path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop:
234 10.0.0.2 GigEthernet0/0/0
235 [@0]: arp-ipv4: via 10.0.0.2 GigEthernet0/0/0
236 path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
237 10.0.1.2 GigEthernet0/0/1
238 [@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
239
240 recursive-resolution refs:1 src-flags:added, cover:-1
241
242 forwarding: unicast-ip4-chain
243 [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
244 [0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
245
246
247notice that the path via 10.0.0.2 is no longer flagged as resolved,
248and the forwarding chain does not contain this path as a
249choice. However, the key thing to note is the load-balance
250instance is still index 17, i.e. it has been modified not
251exchanged. In the FIB vernacular we say it has been 'in-place
252modified', a somewhat linguistically redundant expression, but one that serves
253to emphasise that it was changed whilst still be part of the graph, it
254was never at any point removed from the graph and re-added, and it was
255modified without worker barrier lock held.
256
257Still don't see the significance? In order to converge around the
258failure of the IGP link it was not necessary to update load-balance
259object number 20! It was not necessary to update the recursive
260route. i.e. convergence is achieved without updating any recursive
261routes, it is only necessary to update the affected IGP routes, this is
262the definition of 'a few'. We call this 'prefix independent
263convergence' (PIC) which should really be called 'recursive prefix
264independent convergence' but it isn't...
265
266How was the trick done? As with all problems in computer science, it
267was solved by a layer of misdirection, I mean indirection. The
268indirection is the load-balance that belongs to the IGP route. By
269keeping this object in the forwarding graph and updating it in place,
270we get PIC. The alternative design would be to collapse the two layers of
271load-balancing into one, which would improve forwarding performance
272but would come at the cost of prefix dependent convergence. No doubt
273there are situations where the VPP deployment would favour forwarding
274performance over convergence, you know the drill, contributions welcome.
275
276This failure scenario is known as PIC core, since it's one of the IGP's
277core links that has failed.
278
279iBGP PIC Edge
280^^^^^^^^^^^^^
281
282Next, let's consider alternate paths in BGP, e.g:
283
284.. code-block:: console
285
286 ip route add 8.0.0.0/16 via 1.1.1.1
287 ip route add 8.0.0.0/16 via 1.1.1.2
288
289the 8.0.0.0/16 prefix is reachable via two BGP next-hops (two PEs).
290
291Our FIB now also contains:
292
293.. code-block:: console
294
295 DBGvpp# sh ip fib 8.0.0.0/16
296 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
297 8.0.0.0/16 fib:0 index:18 locks:2
298 API refs:1 src-flags:added,contributing,active,
299 path-list:[15] locks:2 flags:shared, uPRF-list:11 len:2 itfs:[1, 2, ]
300 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
301 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
302 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
303 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
304
305 forwarding: unicast-ip4-chain
306 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:2 uRPF:11 to:[0:0]]
307 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
308 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
309 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
310 [1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:13 to:[0:0]]
311 [0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
312
313The first load-balance (LB) in the forwarding graph is index 20 (the astute
314reader will note this is the same index as in the previous
315section, I am adding paths to the same route, the load-balance is
316in-place modified again). Each choice in LB 20 is another LB
317contributed by the IGP route through which the route's paths recurse.
318
319So what's the equivalent in BGP to a link down in the IGP? An IGP link
320down means it loses its peering out of that link, so the equivalent in
321BGP is the loss of the peering and thus the loss of reachability to
322the peer. This is signaled by the IGP withdrawing the route to the
323peer. But "Wait wait wait", i hear you say ... "just because the IGP
324withdraws 1.1.1.1/32 doesn't mean I can't reach 1.1.1.1, perhaps there
325is a less specific route that gives reachability to 1.1.1.1". Indeed
326there may be. So a little more on BGP network design. I know it's like
327a bad detective novel where the author drip feeds you the plot... When
328describing iBGP peerings one 'always' describes the peer using one of
329its GigEthernet0/0/back addresses. Why? A GigEthernet0/0/back interface
330never goes down (unless you admin down it yourself), some muppet can't
331accidentally cut through the GigEthernet0/0/back cable whilst digging up the
332street. And what subnet mask length does a prefix have on a GigEthernet0/0/back
333interface? it's 'always' a /32. Why? because there's no cable to connect
334any other devices. This choice justifies there 'always' being a /32
335route for the BGP peer. But what prevents there not being a less
336specific - nothing.
337Now clearly if the BGP peer crashes then the /32 for its GigEthernet0/0/back is
338going to be removed from the IGP, but what will withdraw the less
339specific - nothing.
340
341So in order to make use of this trick of relying on the withdrawal of
342the /32 for the peer to signal that the peer is down and thus the
343signal to converge the FIB, we need to force FIB to recurse only via
344the /32 and not via a less specific. This is called a 'recursion
345constraint'. In this case the constraint is 'recurse via host'
346i.e. for ipv4 use a /32.
347So we need to update our route additions from before:
348
349.. code-block:: console
350
351 ip route add 8.0.0.0/16 via 1.1.1.1 resolve-via-host
352 ip route add 8.0.0.0/16 via 1.1.1.2 resolve-via-host
353
354checking the FIB output is left as an exercise to the reader. I hope
355you're doing these configs as you read. There's little change in the
356output, you'll see some extra flags on the paths.
357
358Now let's add the less specific, just for fun:
359
360
361.. code-block:: console
362
363 ip route add 1.1.1.0/28 via 10.0.0.2 GigEthernet0/0/0
364
365nothing changes in resolution of 8.0.0.0/16.
366
367Now withdraw the route to 1.1.1.2/32:
368
369.. code-block:: console
370
371 ip route del 1.1.1.2/32 via 10.0.0.2 GigEthernet0/0/0
372
373In the FIB we see:
374
375.. code-block:: console
376
377 DBGvpp# sh ip fib 8.0.0.0/32
378 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
379 8.0.0.0/16 fib:0 index:18 locks:2
380 API refs:1 src-flags:added,contributing,active,
381 path-list:[15] locks:2 flags:shared, uPRF-list:13 len:2 itfs:[1, 2, ]
382 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
383 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
384 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
385 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
386
387 forwarding: unicast-ip4-chain
388 [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:13 to:[0:0]]
389 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
390 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
391 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
392
393the path via 1.1.1.2 is unresolved, because the recursion constraints
394are preventing the the path resolving via 1.1.1.0/28. the LB index 20
395has been updated to remove the unresolved path.
396
397Job done? Not quite! Why not?
398
399Let's re-examine the goals of this chapter. We wanted to update 'a
400few' objects, which we have defined as not all the millions of
401recursive routes. Did we do that here? We sure did, when we
402modified LB index 20. So WTF?? Where's the indirection object that can
403be modified so that the LBs for the recursive routes are not
404modified - it's not there.... WTF?
405
406OK so the great detective has assembled all the suspects in the
407drawing room and only now does he drop the bomb; the FIB knows the
408scale, we talked above about what the scale **can** be, worst case
409scenario, but that's not necessarily what it is in this hypothetical
410(your) deployment. It knows how many recursive routes there are that
411depend on a /32, it can thus make its own determination of the
412definition of 'a few'. In other words, if there are only 'a few'
413recursive prefixes that depend on a /32 then it will update them
414synchronously (and we'll discuss what synchronously means a bit more later).
415
416So what does FIB consider to be 'a few'. Let's add more routes and
417find out.
418
419.. code-block:: console
420
421 DBGvpp# ip route add 8.1.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
422 ...
423 DBGvpp# ip route add 8.63.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
424
425and we see:
426
427.. code-block:: console
428
429 DBGvpp# sh ip fib 8.8.0.0
430 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
431 8.8.0.0/16 fib:0 index:77 locks:2
432 API refs:1 src-flags:added,contributing,active,
433 path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
434 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
435 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
436 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
437 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
438
439 forwarding: unicast-ip4-chain
440 [@0]: dpo-load-balance: [proto:ip4 index:79 buckets:2 uRPF:28 flags:[uses-map] to:[0:0]]
441 load-balance-map: index:0 buckets:2
442 index: 0 1
443 map: 0 1
444 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
445 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
446 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
447 [1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:18 to:[0:0]]
448 [0] [@3]: arp-ipv4: via 10.0.1.2 GigEthernet0/0/0
449
450
451Two elements to note here; the path-list has the 'popular' flag and
452there is a load-balance map in the forwarding path.
453
454'popular' in this case means that the path-list has passed the limit
455of 'a few' in the number of children it has.
456
457here are the children:
458
459.. code-block:: console
460
461 DBGvpp# sh fib path-list 15
462 path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
463 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
464 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
465 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
466 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
467 children:{entry:18}{entry:21}{entry:22}{entry:23}{entry:25}{entry:26}{entry:27}{entry:28}{entry:29}{entry:30}{entry:31}{entry:32}{entry:33}{entry:34}{entry:35}{entry:36}{entry:37}{entry:38}{entry:39}{entry:40}{entry:41}{entry:42}{entry:43}{entry:44}{entry:45}{entry:46}{entry:47}{entry:48}{entry:49}{entry:50}{entry:51}{entry:52}{entry:53}{entry:54}{entry:55}{entry:56}{entry:57}{entry:58}{entry:59}{entry:60}{entry:61}{entry:62}{entry:63}{entry:64}{entry:65}{entry:66}{entry:67}{entry:68}{entry:69}{entry:70}{entry:71}{entry:72}{entry:73}{entry:74}{entry:75}{entry:76}{entry:77}{entry:78}{entry:79}{entry:80}{entry:81}{entry:82}{entry:83}{entry:84}
468
46964 children makes it popular. The number is fixed (there is no API to
470change it). Its choice is an attempt to balance the performance cost
471of the indirection performance degradation versus the convergence
472gain.
473
474Popular path-lists contribute the load-balance map, this is the
475missing indirection object. Its indirection happens when choosing the
476bucket in the LB. The packet's flow-hash is taken 'mod number of
477buckets' to give the 'candidate bucket' then the map will take this
478'index' and convert it into the 'map'. You can see in the example above
479that no change occurs, i.e. if the flow-hash mod n chooses bucket 1
480then it gets bucket 1.
481
482Why is this useful? The path-list is shared (you can convince
483yourself of this if you look at each of the 8.x.0.0/16 routes we
484added) and all of these routes use the same load-balance map, therefore, to
485converge all the recursive routs, we need only change the map and
486we're good; we again get PIC.
487
488OK who's still awake... if you're thinking there's more to this story,
489you're right. Keep reading.
490
491This failure scenario is called iBGP PIC edge. It's 'edge' because it
492refers to the loss of an edge device, and iBGP because the device was
493a iBGP peer (we learn iBGP peers in the IGP). There is a similar eBGP
494PIC edge scenario, but this is left for an exercise to the reader (hint
495there are other recursion constraints - see the RFC).
496
497Which Objects
498^^^^^^^^^^^^^
499
500The next topic on our list of how to converge quickly was to
501effectively find the objects that need to be updated when a converge
502event happens. If you haven't realised by now that the FIB is an
503object graph, then can I politely suggest you go back and start from
504the beginning ...
505
506Finding the objects affected by a change is simply a matter of walking
507from the parent (the object affected) to its children. These
508dependencies are kept really for this reason.
509
510So is fast convergence just a matter of walking the graph? Yes and
511no. The question to ask yourself is this, "in the case of iBGP PIC edge,
512when the /32 is withdrawn, what is the list of objects that need to be
513updated and particularly what is the order they should be updated in
514order to obtain the best convergence time?" Think breadth v. depth first.
515
516... ponder for a while ...
517
518For iBGP PIC edge we said it's the path-list that provides the
519indirection through the load-balance map. Hence once all path-lists
520are updated we are converged, thereafter, at our leisure, we can
521update the child recursive prefixes. Is the breadth or depth first?
522
523It's breadth first.
524
525Breadth first walks are achieved by spawning an async walk of the
526branch of the graph that we don't want to traverse. Withdrawing the /32
527triggers a synchronous walk of the children of the /32 route, we want
528a synchronous walk because we want to converge ASAP. This synchronous
529walk will encounter path-lists in the /32 route's child dependent list.
530These path-lists (and thier LB maps) will be updated. If a path-list is
531popular, then it will spawn a async walk of the path-list's child
532dependent routes, if not it will walk those routes. So the walk
533effectively proceeds breadth first across the path-lists, then returns
534to the start to do the affected routes.
535
536Now the story is complete. The murderer is revealed.
537
538Let's withdraw one of the IGP routes.
539
540.. code-block:: console
541
542 DBGvpp# ip route del 1.1.1.2/32 via 10.0.1.2 GigEthernet0/0/1
543
544 DBGvpp# sh ip fib 8.8.0.0
545 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
546 8.8.0.0/16 fib:0 index:77 locks:2
547 API refs:1 src-flags:added,contributing,active,
548 path-list:[15] locks:128 flags:shared,popular, uPRF-list:18 len:2 itfs:[1, 2, ]
549 path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
550 via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
551 path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
552 via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
553
554 forwarding: unicast-ip4-chain
555 [@0]: dpo-load-balance: [proto:ip4 index:79 buckets:1 uRPF:18 to:[0:0]]
556 [0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
557 [0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
558 [1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
559
560the LB Map has gone, since the prefix now only has one path. You'll
561need to be a CLI ninja if you want to catch the output showing the LB
562map in its transient state of:
563
564.. code-block:: console
565
566 load-balance-map: index:0 buckets:2
567 index: 0 1
568 map: 0 0
569
570but it happens. Trust me. I've got tests and everything.
571
572On the final topic of how to converge quickly; 'make each update fast'
573there are no tricks.
574
575
576