Blame - docs/gettingstarted/developers/fib20/fastconvergence.rst - fdio/vpp

blob: b07e08cea6d95c808170ec0d828e74f0c9df78d4 [file] [log] [blame]

jdenisco	0923a23	2018-08-29 13:19:43 -0400	[diff] [blame]	1	.. _fastconvergence:
				2
				3	Fast Convergence
				4	------------------------------------
				5
Neale Ranns	dfd3954	2020-11-09 10:09:42 +0000	[diff] [blame]	6	This is an excellent description of the topic:
jdenisco	0923a23	2018-08-29 13:19:43 -0400	[diff] [blame]	7
Neale Ranns	dfd3954	2020-11-09 10:09:42 +0000	[diff] [blame]	8	'FIB <https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-pic-12>'_
				9
				10	but if you're interested in my take keep reading...
				11
				12	First some definitions:
				13
				14	- Convergence; When a FIB is forwarding all packets correctly based
				15	on the network topology (i.e. doing what the routing control plane
				16	has instructed it to do), then it is said to be 'converged'.
				17	Not being in a converged state is [hopefully] a transient state,
				18	when either the topology change (e.g. a link failure) has not been
				19	observed or processed by the routing control plane, or that the FIB
				20	is still processing routing updates. Convergence is the act of
				21	getting to the converged state.
				22	- Fast: In the shortest time possible. There are no absolute limits
				23	placed on how short this must be, although there is one number often
				24	mentioned. Apparently the human ear can detect loss/delay/jitter in
				25	VOIP of 50ms, therefore network failures should last no longer than
				26	this, and some technologies (notably link-free alternate fast
				27	reroute) are designed to converge in this time. However, it is
				28	generally accepted that it is not possible to converge a FIB with
				29	tens of millions of routes in this time scale, the industry
				30	'standard' is sub-second.
				31
				32	Converging the FIB quickly is thus a matter of:
				33
				34	- discovering something is down
				35	- updating as few objects as possible
				36	- to determine which objects to update as efficiently as possible
				37	- to update each object as quickly as possible
				38
				39	we'll discuss each in turn.
				40	All output came from VPP version 21.01rc0. In what follows I use IPv4
				41	prefixes, addresses and IPv4 host length masks, however, exactly the
				42	same applies to IPv6.
				43
				44
				45	Failure Detection
				46	^^^^^^^^^^^^^^^^^
				47
				48	The two common forms (we'll see others later on) of failure detection
				49	are:
				50
				51	- link down
				52	- BFD
				53
				54	The FIB needs to hook into these notifications to trigger
				55	convergence.
				56
				57	Whenever an interface goes down, VPP issues a callback to all
				58	registerd clients. The adjacency code is such a client. The adjacency
				59	is a leaf node in the FIB control-plane graph (containing fib_path_t,
				60	fib_entry_t etc). A back-walk from the adjacnecy will trigger a
				61	re-resolution of the paths.
				62
				63	FIB is a client of BFD in order to receive BFD notifications. BFD
				64	comes in two flavours; single and multi hop. Single hop is to protect
				65	a specific peer on an interface, such peers are modelled by an
				66	adjacency. Multi hop is to protect a peer on an unspecified interface
				67	(i.e. a remote peer), this peer is represented by a host-prefix
				68	fib_entry_t. In both case FIB will add a delegate to the
				69	ip_adjacency_t or fib_entry_t that represents the association
				70	to the BFD session. If the BFD session signals up/down then a backwalk
				71	can be triggered from the object to trigger re-resolution and hence
				72	convergence.
				73
				74
				75	Few Updates
				76	^^^^^^^^^^^
				77
				78	In order to talk about what 'a few' is we have to leave the realm of
				79	the FIB as an abstract graph based object DB and move into the
				80	concrete representation of forwarding in a large network. Large
				81	networks are built in layers, it's how you scale them. We'll take
				82	here a hypothetical service provider (SP) network, but the concepts
				83	apply equally to data center leaf-spines. This is a rudimentary
				84	description, but it should serve our purpose.
				85
				86	An SP manages a BGP autonomous system (AS). The SP's goal is both to
				87	attract traffic into its network to serve its customers, but also to
				88	serve transit traffic passing through it, we'll consider the latter here.
				89	The SP's network is all devices in that AS, these
				90	devices are split into those at the edge (provider edge (PE) routers)
				91	which peer with routers in other SP networks,
				92	and those in the core (termed provider (P) routers). Both the PE and P
				93	routers run the IGP (usually OSPF or ISIS). Only the reachability of the devices
				94	in the AS are advertised in the IGP - thus the scale (i.e. the number
				95	of routes) in the IGP is 'small' - only the number of
				96	devices that the SP has (typically not more than a few 10k).
				97	PE routers run BGP; they have external BGP sessions to devices in
				98	other ASs and internal BGP sessions to devices in the same AS. BGP is
				99	used to advertise the routes to all networks on the internet - at
				100	the time of writing this number is approaching 900k IPv4 route, hopefully by
				101	the time you are reading this the number of IPv6 routes has caught up ...
				102	If we include the additional routes the SP carries to offering VPN service to its
				103	customers the number of BGP routes can grow to the tens of millions.
				104
				105	BGP scale thus exceeds IGP scale by two orders of magnitude... pause for
				106	a moment and let that sink in...
				107
				108	A comparison of BGP and an IGP is way way beyond the scope of this
				109	documentation (and frankly beyond me) so we'll note only the
				110	difference in the form of the routes they present to FIB. A routing
				111	protocol will produce routes that specify the prefixes that are
				112	reachable through its peers. A good IGP
				113	is link state based, it forms peerings to other devices over these
				114	links, hence its routes specify links/interfaces. In
				115	FIB nomenclature this means an IGP produces routes that are
				116	attached-nexthop, e.g.:
				117
				118	.. code-block:: console
				119
				120	ip route add 1.1.1.1/32 via 10.0.0.1 GigEthernet0/0/0
				121
				122	BGP on the other hand forms peerings only to neighbours, it does not
				123	know, nor care, what interface is used to reach the peer. In FIB
				124	nomenclature therefore BGP produces recursive routes, e.g.:
				125
				126	.. code-block:: console
				127
				128	ip route 8.0.0.0/16 via 1.1.1.1
				129
				130	where 1.1.1.1 is the BGP peer. It's no accident in this example that
				131	1.1.1.1/32 happens to be the route the IGP advertised... BGP installs
				132	routes for prefixes reachable via other BGP peers, and the IGP install
				133	the routes to those BGP peers.
				134
				135	This has been a very long winded way of describing why the scale of
				136	recursive routes is therefore 2 orders of magnitude greater than
				137	non-recursive/attached-nexthop routes.
				138
				139	If we step back for a moment and recall why we've crawled down this
				140	rabbit hole, we're trying to determine what 'a few' updates means,
				141	does it include all those recursive routes, probably not ... let's
				142	keep crawling.
				143
				144	We started this chapter with an abstract description of convergence,
				145	let's now make that more real. In the event of a network failure an SP
				146	is interested in moving to an alternate forwarding path as quickly as
				147	possible. If there is no alternate path, and a converged FIB will drop
				148	the packet, then who cares how fast it converges. In other words the
				149	interesting convergence scenarios are the scenarios where the network has
				150	alternate paths.
				151
				152	PIC Core
				153	^^^^^^^^
				154
				155	First let's consider alternate paths in the IGP, e.g.;
				156
				157	.. code-block:: console
				158
				159	ip route add 1.1.1.1/32 via 10.0.0.2 GigEthernet0/0/0
				160	ip route add 1.1.1.1/32 via 10.0.1.2 GigEthernet0/0/1
				161
				162	this gives us in the FIB:
				163
				164	.. code-block:: console
				165
				166	DBGvpp# sh ip fib 1.1.1.1/32
				167	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, default-route:1, ]
				168	1.1.1.1/32 fib:0 index:15 locks:2
				169	API refs:1 src-flags:added,contributing,active,
				170	path-list:[23] locks:2 flags:shared, uPRF-list:22 len:2 itfs:[1, 2, ]
				171	path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
				172	10.0.0.2 GigEthernet0/0/0
				173	[@0]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
				174	path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
				175	10.0.1.2 GigEthernet0/0/1
				176	[@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				177
				178	forwarding: unicast-ip4-chain
				179	[@0]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
				180	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
				181	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				182
				183	There is ECMP across the two paths. Note that the instance/index of the
				184	load-balance present in the forwarding graph is 17.
				185
				186	Let's add a BGP route via this peer;
				187
				188	.. code-block:: console
				189
				190	ip route add 8.0.0.0/16 via 1.1.1.1
				191
				192	in the FIB we see:
				193
				194
				195	.. code-block:: console
				196
				197	DBGvpp# sh ip fib 8.0.0.0/16
				198	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
				199	8.0.0.0/16 fib:0 index:18 locks:2
				200	API refs:1 src-flags:added,contributing,active,
				201	path-list:[24] locks:2 flags:shared, uPRF-list:21 len:2 itfs:[1, 2, ]
				202	path:[29] pl-index:24 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
				203	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				204
				205	forwarding: unicast-ip4-chain
				206	[@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:21 to:[0:0]]
				207	[0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:22 to:[0:0]]
				208	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001111111111dead000000000800
				209	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				210
				211	the load-balance object used by this route is index 20, but note that
				212	the next load-balance in the chain is index 17, i.e. it is exactly
				213	the same instance that appears in the forwarding chain for the IGP
				214	route. So in the forwarding plane the packet first encounters
				215	load-balance object 20 (which it will use in ip4-lookup) and then
				216	number 17 (in ip4-load-balance).
				217
				218	What's the significance? Let's shut down one of those IGP paths:
				219
				220	.. code-block:: console
				221
				222	DBGvpp# set in state GigEthernet0/0/0 down
				223
				224	the resulting update to the IGP route is:
				225
				226	.. code-block:: console
				227
				228	DBGvpp# sh ip fib 1.1.1.1/32
				229	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, ]
				230	1.1.1.1/32 fib:0 index:15 locks:4
				231	API refs:1 src-flags:added,contributing,active,
				232	path-list:[23] locks:2 flags:shared, uPRF-list:25 len:2 itfs:[1, 2, ]
				233	path:[27] pl-index:23 ip4 weight=1 pref=0 attached-nexthop:
				234	10.0.0.2 GigEthernet0/0/0
				235	[@0]: arp-ipv4: via 10.0.0.2 GigEthernet0/0/0
				236	path:[28] pl-index:23 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
				237	10.0.1.2 GigEthernet0/0/1
				238	[@0]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				239
				240	recursive-resolution refs:1 src-flags:added, cover:-1
				241
				242	forwarding: unicast-ip4-chain
				243	[@0]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
				244	[0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				245
				246
				247	notice that the path via 10.0.0.2 is no longer flagged as resolved,
				248	and the forwarding chain does not contain this path as a
				249	choice. However, the key thing to note is the load-balance
				250	instance is still index 17, i.e. it has been modified not
				251	exchanged. In the FIB vernacular we say it has been 'in-place
				252	modified', a somewhat linguistically redundant expression, but one that serves
				253	to emphasise that it was changed whilst still be part of the graph, it
				254	was never at any point removed from the graph and re-added, and it was
				255	modified without worker barrier lock held.
				256
				257	Still don't see the significance? In order to converge around the
				258	failure of the IGP link it was not necessary to update load-balance
				259	object number 20! It was not necessary to update the recursive
				260	route. i.e. convergence is achieved without updating any recursive
				261	routes, it is only necessary to update the affected IGP routes, this is
				262	the definition of 'a few'. We call this 'prefix independent
				263	convergence' (PIC) which should really be called 'recursive prefix
				264	independent convergence' but it isn't...
				265
				266	How was the trick done? As with all problems in computer science, it
				267	was solved by a layer of misdirection, I mean indirection. The
				268	indirection is the load-balance that belongs to the IGP route. By
				269	keeping this object in the forwarding graph and updating it in place,
				270	we get PIC. The alternative design would be to collapse the two layers of
				271	load-balancing into one, which would improve forwarding performance
				272	but would come at the cost of prefix dependent convergence. No doubt
				273	there are situations where the VPP deployment would favour forwarding
				274	performance over convergence, you know the drill, contributions welcome.
				275
				276	This failure scenario is known as PIC core, since it's one of the IGP's
				277	core links that has failed.
				278
				279	iBGP PIC Edge
				280	^^^^^^^^^^^^^
				281
				282	Next, let's consider alternate paths in BGP, e.g:
				283
				284	.. code-block:: console
				285
				286	ip route add 8.0.0.0/16 via 1.1.1.1
				287	ip route add 8.0.0.0/16 via 1.1.1.2
				288
				289	the 8.0.0.0/16 prefix is reachable via two BGP next-hops (two PEs).
				290
				291	Our FIB now also contains:
				292
				293	.. code-block:: console
				294
				295	DBGvpp# sh ip fib 8.0.0.0/16
				296	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
				297	8.0.0.0/16 fib:0 index:18 locks:2
				298	API refs:1 src-flags:added,contributing,active,
				299	path-list:[15] locks:2 flags:shared, uPRF-list:11 len:2 itfs:[1, 2, ]
				300	path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
				301	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				302	path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved,
				303	via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
				304
				305	forwarding: unicast-ip4-chain
				306	[@0]: dpo-load-balance: [proto:ip4 index:20 buckets:2 uRPF:11 to:[0:0]]
				307	[0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:25 to:[0:0]]
				308	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
				309	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				310	[1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:13 to:[0:0]]
				311	[0] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				312
				313	The first load-balance (LB) in the forwarding graph is index 20 (the astute
				314	reader will note this is the same index as in the previous
				315	section, I am adding paths to the same route, the load-balance is
				316	in-place modified again). Each choice in LB 20 is another LB
				317	contributed by the IGP route through which the route's paths recurse.
				318
				319	So what's the equivalent in BGP to a link down in the IGP? An IGP link
				320	down means it loses its peering out of that link, so the equivalent in
				321	BGP is the loss of the peering and thus the loss of reachability to
				322	the peer. This is signaled by the IGP withdrawing the route to the
				323	peer. But "Wait wait wait", i hear you say ... "just because the IGP
				324	withdraws 1.1.1.1/32 doesn't mean I can't reach 1.1.1.1, perhaps there
				325	is a less specific route that gives reachability to 1.1.1.1". Indeed
				326	there may be. So a little more on BGP network design. I know it's like
				327	a bad detective novel where the author drip feeds you the plot... When
				328	describing iBGP peerings one 'always' describes the peer using one of
				329	its GigEthernet0/0/back addresses. Why? A GigEthernet0/0/back interface
				330	never goes down (unless you admin down it yourself), some muppet can't
				331	accidentally cut through the GigEthernet0/0/back cable whilst digging up the
				332	street. And what subnet mask length does a prefix have on a GigEthernet0/0/back
				333	interface? it's 'always' a /32. Why? because there's no cable to connect
				334	any other devices. This choice justifies there 'always' being a /32
				335	route for the BGP peer. But what prevents there not being a less
				336	specific - nothing.
				337	Now clearly if the BGP peer crashes then the /32 for its GigEthernet0/0/back is
				338	going to be removed from the IGP, but what will withdraw the less
				339	specific - nothing.
				340
				341	So in order to make use of this trick of relying on the withdrawal of
				342	the /32 for the peer to signal that the peer is down and thus the
				343	signal to converge the FIB, we need to force FIB to recurse only via
				344	the /32 and not via a less specific. This is called a 'recursion
				345	constraint'. In this case the constraint is 'recurse via host'
				346	i.e. for ipv4 use a /32.
				347	So we need to update our route additions from before:
				348
				349	.. code-block:: console
				350
				351	ip route add 8.0.0.0/16 via 1.1.1.1 resolve-via-host
				352	ip route add 8.0.0.0/16 via 1.1.1.2 resolve-via-host
				353
				354	checking the FIB output is left as an exercise to the reader. I hope
				355	you're doing these configs as you read. There's little change in the
				356	output, you'll see some extra flags on the paths.
				357
				358	Now let's add the less specific, just for fun:
				359
				360
				361	.. code-block:: console
				362
				363	ip route add 1.1.1.0/28 via 10.0.0.2 GigEthernet0/0/0
				364
				365	nothing changes in resolution of 8.0.0.0/16.
				366
				367	Now withdraw the route to 1.1.1.2/32:
				368
				369	.. code-block:: console
				370
				371	ip route del 1.1.1.2/32 via 10.0.0.2 GigEthernet0/0/0
				372
				373	In the FIB we see:
				374
				375	.. code-block:: console
				376
				377	DBGvpp# sh ip fib 8.0.0.0/32
				378	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:2, default-route:1, ]
				379	8.0.0.0/16 fib:0 index:18 locks:2
				380	API refs:1 src-flags:added,contributing,active,
				381	path-list:[15] locks:2 flags:shared, uPRF-list:13 len:2 itfs:[1, 2, ]
				382	path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				383	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				384	path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
				385	via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
				386
				387	forwarding: unicast-ip4-chain
				388	[@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:13 to:[0:0]]
				389	[0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
				390	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
				391	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				392
				393	the path via 1.1.1.2 is unresolved, because the recursion constraints
				394	are preventing the the path resolving via 1.1.1.0/28. the LB index 20
				395	has been updated to remove the unresolved path.
				396
				397	Job done? Not quite! Why not?
				398
				399	Let's re-examine the goals of this chapter. We wanted to update 'a
				400	few' objects, which we have defined as not all the millions of
				401	recursive routes. Did we do that here? We sure did, when we
				402	modified LB index 20. So WTF?? Where's the indirection object that can
				403	be modified so that the LBs for the recursive routes are not
				404	modified - it's not there.... WTF?
				405
				406	OK so the great detective has assembled all the suspects in the
				407	drawing room and only now does he drop the bomb; the FIB knows the
				408	scale, we talked above about what the scale can be, worst case
				409	scenario, but that's not necessarily what it is in this hypothetical
				410	(your) deployment. It knows how many recursive routes there are that
				411	depend on a /32, it can thus make its own determination of the
				412	definition of 'a few'. In other words, if there are only 'a few'
				413	recursive prefixes that depend on a /32 then it will update them
				414	synchronously (and we'll discuss what synchronously means a bit more later).
				415
				416	So what does FIB consider to be 'a few'. Let's add more routes and
				417	find out.
				418
				419	.. code-block:: console
				420
				421	DBGvpp# ip route add 8.1.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
				422	...
				423	DBGvpp# ip route add 8.63.0.0/16 via 1.1.1.2 resolve-via-host via 1.1.1.1 resolve-via-host
				424
				425	and we see:
				426
				427	.. code-block:: console
				428
				429	DBGvpp# sh ip fib 8.8.0.0
				430	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
				431	8.8.0.0/16 fib:0 index:77 locks:2
				432	API refs:1 src-flags:added,contributing,active,
				433	path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
				434	path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				435	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				436	path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				437	via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
				438
				439	forwarding: unicast-ip4-chain
				440	[@0]: dpo-load-balance: [proto:ip4 index:79 buckets:2 uRPF:28 flags:[uses-map] to:[0:0]]
				441	load-balance-map: index:0 buckets:2
				442	index: 0 1
				443	map: 0 1
				444	[0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
				445	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
				446	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				447	[1] [@12]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:18 to:[0:0]]
				448	[0] [@3]: arp-ipv4: via 10.0.1.2 GigEthernet0/0/0
				449
				450
				451	Two elements to note here; the path-list has the 'popular' flag and
				452	there is a load-balance map in the forwarding path.
				453
				454	'popular' in this case means that the path-list has passed the limit
				455	of 'a few' in the number of children it has.
				456
				457	here are the children:
				458
				459	.. code-block:: console
				460
				461	DBGvpp# sh fib path-list 15
				462	path-list:[15] locks:128 flags:shared,popular, uPRF-list:28 len:2 itfs:[1, 2, ]
				463	path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				464	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				465	path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				466	via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-load-balance:12]
				467	children:{entry:18}{entry:21}{entry:22}{entry:23}{entry:25}{entry:26}{entry:27}{entry:28}{entry:29}{entry:30}{entry:31}{entry:32}{entry:33}{entry:34}{entry:35}{entry:36}{entry:37}{entry:38}{entry:39}{entry:40}{entry:41}{entry:42}{entry:43}{entry:44}{entry:45}{entry:46}{entry:47}{entry:48}{entry:49}{entry:50}{entry:51}{entry:52}{entry:53}{entry:54}{entry:55}{entry:56}{entry:57}{entry:58}{entry:59}{entry:60}{entry:61}{entry:62}{entry:63}{entry:64}{entry:65}{entry:66}{entry:67}{entry:68}{entry:69}{entry:70}{entry:71}{entry:72}{entry:73}{entry:74}{entry:75}{entry:76}{entry:77}{entry:78}{entry:79}{entry:80}{entry:81}{entry:82}{entry:83}{entry:84}
				468
				469	64 children makes it popular. The number is fixed (there is no API to
				470	change it). Its choice is an attempt to balance the performance cost
				471	of the indirection performance degradation versus the convergence
				472	gain.
				473
				474	Popular path-lists contribute the load-balance map, this is the
				475	missing indirection object. Its indirection happens when choosing the
				476	bucket in the LB. The packet's flow-hash is taken 'mod number of
				477	buckets' to give the 'candidate bucket' then the map will take this
				478	'index' and convert it into the 'map'. You can see in the example above
				479	that no change occurs, i.e. if the flow-hash mod n chooses bucket 1
				480	then it gets bucket 1.
				481
				482	Why is this useful? The path-list is shared (you can convince
				483	yourself of this if you look at each of the 8.x.0.0/16 routes we
				484	added) and all of these routes use the same load-balance map, therefore, to
				485	converge all the recursive routs, we need only change the map and
				486	we're good; we again get PIC.
				487
				488	OK who's still awake... if you're thinking there's more to this story,
				489	you're right. Keep reading.
				490
				491	This failure scenario is called iBGP PIC edge. It's 'edge' because it
				492	refers to the loss of an edge device, and iBGP because the device was
				493	a iBGP peer (we learn iBGP peers in the IGP). There is a similar eBGP
				494	PIC edge scenario, but this is left for an exercise to the reader (hint
				495	there are other recursion constraints - see the RFC).
				496
				497	Which Objects
				498	^^^^^^^^^^^^^
				499
				500	The next topic on our list of how to converge quickly was to
				501	effectively find the objects that need to be updated when a converge
				502	event happens. If you haven't realised by now that the FIB is an
				503	object graph, then can I politely suggest you go back and start from
				504	the beginning ...
				505
				506	Finding the objects affected by a change is simply a matter of walking
				507	from the parent (the object affected) to its children. These
				508	dependencies are kept really for this reason.
				509
				510	So is fast convergence just a matter of walking the graph? Yes and
				511	no. The question to ask yourself is this, "in the case of iBGP PIC edge,
				512	when the /32 is withdrawn, what is the list of objects that need to be
				513	updated and particularly what is the order they should be updated in
				514	order to obtain the best convergence time?" Think breadth v. depth first.
				515
				516	... ponder for a while ...
				517
				518	For iBGP PIC edge we said it's the path-list that provides the
				519	indirection through the load-balance map. Hence once all path-lists
				520	are updated we are converged, thereafter, at our leisure, we can
				521	update the child recursive prefixes. Is the breadth or depth first?
				522
				523	It's breadth first.
				524
				525	Breadth first walks are achieved by spawning an async walk of the
				526	branch of the graph that we don't want to traverse. Withdrawing the /32
				527	triggers a synchronous walk of the children of the /32 route, we want
				528	a synchronous walk because we want to converge ASAP. This synchronous
				529	walk will encounter path-lists in the /32 route's child dependent list.
				530	These path-lists (and thier LB maps) will be updated. If a path-list is
				531	popular, then it will spawn a async walk of the path-list's child
				532	dependent routes, if not it will walk those routes. So the walk
				533	effectively proceeds breadth first across the path-lists, then returns
				534	to the start to do the affected routes.
				535
				536	Now the story is complete. The murderer is revealed.
				537
				538	Let's withdraw one of the IGP routes.
				539
				540	.. code-block:: console
				541
				542	DBGvpp# ip route del 1.1.1.2/32 via 10.0.1.2 GigEthernet0/0/1
				543
				544	DBGvpp# sh ip fib 8.8.0.0
				545	ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:4, default-route:1, ]
				546	8.8.0.0/16 fib:0 index:77 locks:2
				547	API refs:1 src-flags:added,contributing,active,
				548	path-list:[15] locks:128 flags:shared,popular, uPRF-list:18 len:2 itfs:[1, 2, ]
				549	path:[17] pl-index:15 ip4 weight=1 pref=0 recursive: oper-flags:resolved, cfg-flags:resolve-host,
				550	via 1.1.1.1 in fib:0 via-fib:15 via-dpo:[dpo-load-balance:17]
				551	path:[15] pl-index:15 ip4 weight=1 pref=0 recursive: cfg-flags:resolve-host,
				552	via 1.1.1.2 in fib:0 via-fib:10 via-dpo:[dpo-drop:0]
				553
				554	forwarding: unicast-ip4-chain
				555	[@0]: dpo-load-balance: [proto:ip4 index:79 buckets:1 uRPF:18 to:[0:0]]
				556	[0] [@12]: dpo-load-balance: [proto:ip4 index:17 buckets:2 uRPF:27 to:[0:0]]
				557	[0] [@5]: ipv4 via 10.0.0.2 GigEthernet0/0/0: mtu:9000 next:3 001122334455dead000000000800
				558	[1] [@5]: ipv4 via 10.0.1.2 GigEthernet0/0/1: mtu:9000 next:4 001111111111dead000000010800
				559
				560	the LB Map has gone, since the prefix now only has one path. You'll
				561	need to be a CLI ninja if you want to catch the output showing the LB
				562	map in its transient state of:
				563
				564	.. code-block:: console
				565
				566	load-balance-map: index:0 buckets:2
				567	index: 0 1
				568	map: 0 0
				569
				570	but it happens. Trust me. I've got tests and everything.
				571
				572	On the final topic of how to converge quickly; 'make each update fast'
				573	there are no tricks.
				574
				575
				576