Blame - docs/gettingstarted/developers/fib20/marknsweep.rst - fdio/vpp

blob: e9e38a33f3a3c4d3e3a4707d5f112345f6700720 [file] [log] [blame]

Neale Ranns	dfd3954	2020-11-09 10:09:42 +0000	[diff] [blame]	1	.. _marknsweep:
				2
				3	Mark and Sweep
				4	--------------
				5
				6	The mark and sweep procedures, in FIB and in other subsystems, are
				7	built for the purpose of recovering from a control plane crash.
				8
				9	In routing if the control plane (CP) crashes, when it restarts, the network
				10	topology may have changed. This means that some of the routes that
				11	were programmed in the FIB may no longer be needed, and perhaps some
				12	new ones are. If the CP were simply to insert all the new routes it
				13	learned after it restarts, then FIB could be left with old routes that
				14	never get removed, this would be bigly bad.
				15
				16	At a high level the requirement is to delete routes from the old set
				17	that are not present in the new set; 'delete the diff' as it might
				18	be colloquially known.
				19
				20	How should the control plane determine the old set? It could
				21	conceivably read back the FIB from VPP. But this presents two
				22	problems, firstly, it could be a large set of routes, numbering in the
				23	millions, this is not an efficient mechanism and not one one wants to
				24	perform at a point when the router is trying to converge
				25	ASAP. Secondly it represents a 'source of truth' inversion. The
				26	routing plane is the source of truth, not forwarding. Routing should
				27	not receive its 'input' from the layers below. Thirdly, on a practical
				28	note, the reading of VPP data structures to glean this sort of
				29	accurate information, would only happen in this scenario, i.e. it's
				30	not well tested and therefore not particularly reliable (see point 2).
				31
				32	Enter 'mark and sweep' or m-n-s (not to be confused with the retail
				33	giant) as it's affectionately known.
				34
				35	The Mark and Sweep algorithm proceeds in three steps:
				36
				37	- Step 1; the CP declares to VPP that it wants to begin the process
				38	(i.e. it has just restarted). At this point VPP will iterate through
				39	all the objects that the CP owns and 'mark' then as being
				40	stale. This process effectively declares a new 'epoch', a barrier in
				41	time that separates the old objects from the new.
				42	- Step 2; The CP downloads all of its new objects. If one of these new
				43	CP objects matches (has the same key as) an existing object, then
				44	the CP add is considered an update, and the object's stale state is
				45	removed.
				46	- Step 3: The CP declares it has 'converged'; it has no more updates
				47	to give (at this time). VPP will then again iterate through all the
				48	CP's objects and remove those that do not belong to the new epoch,
				49	i.e. those that are still marked stale.
				50
				51	After step 3, the CP and VPP databases are in sync.
				52
				53	The cost of the process was to download all the new routes again. This
				54	is a highly-tuned and well-tested scenario.
				55
				56	In VPP we use the synonym 'replace' to describe the mark-n-sweep
				57	action in the API. We use this term because it refers to the goals of
				58	the algorithm at a high level - the CP wants to replace the old DB
				59	with a new one - but it does not specify the algorithm by which that
				60	is achieved. One could equally perform this task by constructing a
				61	brand new DB in VPP, and then swapping them when the CP
				62	converges. Other subsystems may employ that approach, but FIB does
				63	not. Updates are typically faster than adds, since the update is
				64	likely a no-op, whereas a separate add would require the memory
				65	allocator, which is the long pole in FIB additions. Additionally, it requires
				66	twice the memory for a moment in time, which could be prohibitive when
				67	the FIB is large.
				68