blob: e9e38a33f3a3c4d3e3a4707d5f112345f6700720 [file] [log] [blame]
Neale Rannsdfd39542020-11-09 10:09:42 +00001.. _marknsweep:
2
3Mark and Sweep
4--------------
5
6The mark and sweep procedures, in FIB and in other subsystems, are
7built for the purpose of recovering from a control plane crash.
8
9In routing if the control plane (CP) crashes, when it restarts, the network
10topology may have changed. This means that some of the routes that
11were programmed in the FIB may no longer be needed, and perhaps some
12new ones are. If the CP were simply to insert all the new routes it
13learned after it restarts, then FIB could be left with old routes that
14never get removed, this would be bigly bad.
15
16At a high level the requirement is to delete routes from the old set
17that are not present in the new set; 'delete the diff' as it might
18be colloquially known.
19
20How should the control plane determine the old set? It could
21conceivably read back the FIB from VPP. But this presents two
22problems, firstly, it could be a large set of routes, numbering in the
23millions, this is not an efficient mechanism and not one one wants to
24perform at a point when the router is trying to converge
25ASAP. Secondly it represents a 'source of truth' inversion. The
26routing plane is the source of truth, not forwarding. Routing should
27not receive its 'input' from the layers below. Thirdly, on a practical
28note, the reading of VPP data structures to glean this sort of
29accurate information, would only happen in this scenario, i.e. it's
30not well tested and therefore not particularly reliable (see point 2).
31
32Enter 'mark and sweep' or m-n-s (not to be confused with the retail
33giant) as it's affectionately known.
34
35The Mark and Sweep algorithm proceeds in three steps:
36
37- Step 1; the CP declares to VPP that it wants to begin the process
38 (i.e. it has just restarted). At this point VPP will iterate through
39 all the objects that the CP owns and 'mark' then as being
40 stale. This process effectively declares a new 'epoch', a barrier in
41 time that separates the old objects from the new.
42- Step 2; The CP downloads all of its new objects. If one of these new
43 CP objects matches (has the same key as) an existing object, then
44 the CP add is considered an update, and the object's stale state is
45 removed.
46- Step 3: The CP declares it has 'converged'; it has no more updates
47 to give (at this time). VPP will then again iterate through all the
48 CP's objects and remove those that do not belong to the new epoch,
49 i.e. those that are still marked stale.
50
51After step 3, the CP and VPP databases are in sync.
52
53The cost of the process was to download all the new routes again. This
54is a highly-tuned and well-tested scenario.
55
56In VPP we use the synonym 'replace' to describe the mark-n-sweep
57action in the API. We use this term because it refers to the goals of
58the algorithm at a high level - the CP wants to replace the old DB
59with a new one - but it does not specify the algorithm by which that
60is achieved. One could equally perform this task by constructing a
61brand new DB in VPP, and then swapping them when the CP
62converges. Other subsystems may employ that approach, but FIB does
63not. Updates are typically faster than adds, since the update is
64likely a no-op, whereas a separate add would require the memory
65allocator, which is the long pole in FIB additions. Additionally, it requires
66twice the memory for a moment in time, which could be prohibitive when
67the FIB is large.
68