blob: 1ef5ab5ac343ed4852ef29bc5b98c8024078c922 [file] [log] [blame]
Ed Warnickecb9cada2015-12-08 15:45:58 -07001=== vnet classifier theory of operation ===
2
3The vnet classifier trades off simplicity and perf / scale
4characteristics. At a certain level, it's a dumb robot. Given an
5incoming packet, search an ordered list of (mask, match) tables. If
6the classifier finds a matching entry, take the indicated action. If
7not, take a last-resort action.
8
9We use the MMX-unit to match or hash 16 octets at a time. For hardware
10backward compatibility, the code does not [currently] use 256-bit
11(32-octet) vector instructions.
12
13Effective use of the classifier centers around building table lists
14which "hit" as soon as practicable. In many cases, established
15sessions hit in the first table. In this mode of operation, the
16classifier easily processes multiple MPPS / core - even with millions
17of sessions in the data base. Searching 357 tables on a regular basis
18will neatly solve the halting problem.
19
20==== Basic operation ====
21
22The classifier mask-and-match operation proceeds as follows. Given a
23starting classifier table index, lay hands on the indicated mask
24vector. When building tables, we arrange for the mask to obey
25mmx-unit (16-octet) alignment.
26
27We know that the first octet of packet data starts on a cache-line
28boundary. Further, it's reasonably likely that folks won't want to use
29the generalized classifier on the L2 header; preferring to decode the
30Ethertype manually. That scheme makes it easy to select among ip4 /
31ip6 / MPLS, etc. classifier table sets.
32
33A no-vlan-tag L2 header is 14 octets long. A typical ipv4 header
34begins with the octets 0x4500: version=4, header_length=5, DSCP=0,
35ECN=0. If one doesn't intend to classify on (DSCP, ECN) - the typical
36case - we program the classifier to skip the first 16-octet vector.
37
38To classify untagged ipv4 packets on source address, we program the
39classifier to skip one vector, and mask-and-match one vector.
40
41The basic match-and-match operation looks like this:
42
43 switch (t->match_n_vectors)
44 {
45 case 1:
46 result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
47 break;
48
49 case 2:
50 result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
51 result |= (data[1 + t->skip_n_vectors] & mask[1]) ^ key[1];
52 break;
53
54 <etc>
55 }
56
57 result_mask = u32x4_zero_byte_mask (result);
58 if (result_mask == 0xffff)
59 return (v);
60
61Net of setup, it costs a couple of clock cycles to mask-and-match 16
62octets.
63
64At the risk of belaboring an obvious point, the control-plane
65'''must''' pay attention to detail. When skipping one (or more)
66vectors, masks and matches must reflect that decision. See
67.../vnet/vnet/classify/vnet_classify.c:unformat_classify_[mask|match]. Note
68that vec_validate (xxx, 13) creates a 14-element vector.
69
70==== Creating a classifier table ====
71
72To create a new classifier table via the control-plane API, send a
73"classify_add_del_table" message. The underlying action routine,
74vnet_classify_add_del_table(...), is located in
75.../vnet/vnet/classify/vnet_classify.c, and has the following
76prototype:
77
78 int vnet_classify_add_del_table (vnet_classify_main_t * cm,
79 u8 * mask,
80 u32 nbuckets,
81 u32 memory_size,
82 u32 skip,
83 u32 match,
84 u32 next_table_index,
85 u32 miss_next_index,
86 u32 * table_index,
87 int is_add)
88
89Pass cm = &vnet_classify_main if calling this routine directly. Mask,
90skip(_n_vectors) and match(_n_vectors) are as described above. Mask
91need not be aligned, but it must be match*16 octets in length. To
92avoid having your head explode, be absolutely certain that '''only'''
93the bits you intend to match on are set.
94
95The classifier uses thread-safe, no-reader-locking-required
96bounded-index extensible hashing. Nbuckets is the [fixed] size of the
97hash bucket vector. The algorithm works in constant time regardless of
98hash collisions, but wastes space when the bucket array is too
99small. A good rule of thumb: let nbuckets = approximate number of
100entries expected.
101
102At a signficant cost in complexity, it would be possible to resize the
103bucket array dynamically. We have no plans to implement that function.
104
105Each classifier table has its own clib mheap memory allocation
106arena. To pick the memory_size parameter, note that each classifier
107table entry needs 16*(1 + match_n_vectors) bytes. Within reason, aim a
108bit high. Clib mheap memory uses o/s level virtual memory - not wired
109or hugetlb memory - so it's best not to scrimp on size.
110
111The "next_table_index" parameter is as described: the pool index in
112vnet_classify_main.tables of the next table to search. Code ~0 to
113indicate the end of the table list. 0 is a valid table index!
114
115We often create classification tables in reverse order -
116last-table-searched to first-table-searched - so we can easily set
117this parameter. Of course, one can manually adjust the data structure
118after-the-fact.
119
120Specific classifier client nodes - for example,
121.../vnet/vnet/classify/ip_classify.c - interpret the "miss_next_index"
122parameter as a vpp graph-node next index. When packet classification
123fails to produce a match, ip_classify_inline sends packets to the
124indicated disposition. A classifier application might program this
125parameter to send packets which don't match an existing session to a
126"first-sign-of-life, create-new-session" node.
127
128Finally, the is_add parameter indicates whether to add or delete the
129indicated table. The delete case implicitly terminates all sessions
130with extreme prejudice, by freeing the specified clib mheap.
131
132==== Creating a classifier session ====
133
134To create a new classifier session via the control-plane API, send a
135"classify_add_del_session" message. The underlying action routine,
136vnet_classify_add_del_session(...), is located in
137.../vnet/vnet/classify/vnet_classify.c, and has the following
138prototype:
139
140int vnet_classify_add_del_session (vnet_classify_main_t * cm,
141 u32 table_index,
142 u8 * match,
143 u32 hit_next_index,
144 u32 opaque_index,
145 i32 advance,
146 int is_add)
147
148Pass cm = &vnet_classify_main if calling this routine directly. Table
149index specifies the table which receives the new session / contains
150the session to delete depending on is_add.
151
152Match is the key for the indicated session. It need not be aligned,
153but it must be table->match_n_vectors*16 octets in length. As a
154courtesy, vnet_classify_add_del_session applies the table's mask to
155the stored key-value. In this way, one can create a session by passing
156unmasked (packet_data + offset) as the "match" parameter, and end up
157with unconfusing session keys.
158
159Specific classifier client nodes - for example,
160.../vnet/vnet/classify/ip_classify.c - interpret the per-session
161hit_next_index parameter as a vpp graph-node next index. When packet
162classification produces a match, ip_classify_inline sends packets to
163the indicated disposition.
164
165ip4/6_classify place the per-session opaque_index parameter into
166vnet_buffer(b)->l2_classify.opaque_index; a slight misnomer, but
167anyhow classifier applications can send session-hit packets to
168specific graph nodes, with useful values in buffer metadata. Depending
169on the required semantics, we send known-session traffic to a certain
170node, with e.g. a session pool index in buffer metadata. It's totally
171up to the control-plane and the specific use-case.
172
173Finally, nodes such as ip4/6-classify apply the advance parameter as a
174[signed!] argument to vlib_buffer_advance(...); to "consume" a
175networking layer. Example: if we classify incoming tunneled IP packets
176by (inner) source/dest address and source/dest port, we might choose
177to decapsulate and reencapsulate the inner packet. In such a case,
178program the advance parameter to perform the tunnel decapsulation, and
179program next_index to send traffic to a node which uses
180e.g. opaque_index to output traffic on a specific tunnel interface.