Blame - src/vnet/classify/README - fdio/vpp

blob: 1ef5ab5ac343ed4852ef29bc5b98c8024078c922 [file] [log] [blame]

Ed Warnicke	cb9cada	2015-12-08 15:45:58 -0700	[diff] [blame]	1	=== vnet classifier theory of operation ===
				2
				3	The vnet classifier trades off simplicity and perf / scale
				4	characteristics. At a certain level, it's a dumb robot. Given an
				5	incoming packet, search an ordered list of (mask, match) tables. If
				6	the classifier finds a matching entry, take the indicated action. If
				7	not, take a last-resort action.
				8
				9	We use the MMX-unit to match or hash 16 octets at a time. For hardware
				10	backward compatibility, the code does not [currently] use 256-bit
				11	(32-octet) vector instructions.
				12
				13	Effective use of the classifier centers around building table lists
				14	which "hit" as soon as practicable. In many cases, established
				15	sessions hit in the first table. In this mode of operation, the
				16	classifier easily processes multiple MPPS / core - even with millions
				17	of sessions in the data base. Searching 357 tables on a regular basis
				18	will neatly solve the halting problem.
				19
				20	==== Basic operation ====
				21
				22	The classifier mask-and-match operation proceeds as follows. Given a
				23	starting classifier table index, lay hands on the indicated mask
				24	vector. When building tables, we arrange for the mask to obey
				25	mmx-unit (16-octet) alignment.
				26
				27	We know that the first octet of packet data starts on a cache-line
				28	boundary. Further, it's reasonably likely that folks won't want to use
				29	the generalized classifier on the L2 header; preferring to decode the
				30	Ethertype manually. That scheme makes it easy to select among ip4 /
				31	ip6 / MPLS, etc. classifier table sets.
				32
				33	A no-vlan-tag L2 header is 14 octets long. A typical ipv4 header
				34	begins with the octets 0x4500: version=4, header_length=5, DSCP=0,
				35	ECN=0. If one doesn't intend to classify on (DSCP, ECN) - the typical
				36	case - we program the classifier to skip the first 16-octet vector.
				37
				38	To classify untagged ipv4 packets on source address, we program the
				39	classifier to skip one vector, and mask-and-match one vector.
				40
				41	The basic match-and-match operation looks like this:
				42
				43	switch (t->match_n_vectors)
				44	{
				45	case 1:
				46	result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
				47	break;
				48
				49	case 2:
				50	result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
				51	result \|= (data[1 + t->skip_n_vectors] & mask[1]) ^ key[1];
				52	break;
				53
				54	<etc>
				55	}
				56
				57	result_mask = u32x4_zero_byte_mask (result);
				58	if (result_mask == 0xffff)
				59	return (v);
				60
				61	Net of setup, it costs a couple of clock cycles to mask-and-match 16
				62	octets.
				63
				64	At the risk of belaboring an obvious point, the control-plane
				65	'''must''' pay attention to detail. When skipping one (or more)
				66	vectors, masks and matches must reflect that decision. See
				67	.../vnet/vnet/classify/vnet_classify.c:unformat_classify_[mask\|match]. Note
				68	that vec_validate (xxx, 13) creates a 14-element vector.
				69
				70	==== Creating a classifier table ====
				71
				72	To create a new classifier table via the control-plane API, send a
				73	"classify_add_del_table" message. The underlying action routine,
				74	vnet_classify_add_del_table(...), is located in
				75	.../vnet/vnet/classify/vnet_classify.c, and has the following
				76	prototype:
				77
				78	int vnet_classify_add_del_table (vnet_classify_main_t * cm,
				79	u8 * mask,
				80	u32 nbuckets,
				81	u32 memory_size,
				82	u32 skip,
				83	u32 match,
				84	u32 next_table_index,
				85	u32 miss_next_index,
				86	u32 * table_index,
				87	int is_add)
				88
				89	Pass cm = &vnet_classify_main if calling this routine directly. Mask,
				90	skip(_n_vectors) and match(_n_vectors) are as described above. Mask
				91	need not be aligned, but it must be match*16 octets in length. To
				92	avoid having your head explode, be absolutely certain that '''only'''
				93	the bits you intend to match on are set.
				94
				95	The classifier uses thread-safe, no-reader-locking-required
				96	bounded-index extensible hashing. Nbuckets is the [fixed] size of the
				97	hash bucket vector. The algorithm works in constant time regardless of
				98	hash collisions, but wastes space when the bucket array is too
				99	small. A good rule of thumb: let nbuckets = approximate number of
				100	entries expected.
				101
				102	At a signficant cost in complexity, it would be possible to resize the
				103	bucket array dynamically. We have no plans to implement that function.
				104
				105	Each classifier table has its own clib mheap memory allocation
				106	arena. To pick the memory_size parameter, note that each classifier
				107	table entry needs 16*(1 + match_n_vectors) bytes. Within reason, aim a
				108	bit high. Clib mheap memory uses o/s level virtual memory - not wired
				109	or hugetlb memory - so it's best not to scrimp on size.
				110
				111	The "next_table_index" parameter is as described: the pool index in
				112	vnet_classify_main.tables of the next table to search. Code ~0 to
				113	indicate the end of the table list. 0 is a valid table index!
				114
				115	We often create classification tables in reverse order -
				116	last-table-searched to first-table-searched - so we can easily set
				117	this parameter. Of course, one can manually adjust the data structure
				118	after-the-fact.
				119
				120	Specific classifier client nodes - for example,
				121	.../vnet/vnet/classify/ip_classify.c - interpret the "miss_next_index"
				122	parameter as a vpp graph-node next index. When packet classification
				123	fails to produce a match, ip_classify_inline sends packets to the
				124	indicated disposition. A classifier application might program this
				125	parameter to send packets which don't match an existing session to a
				126	"first-sign-of-life, create-new-session" node.
				127
				128	Finally, the is_add parameter indicates whether to add or delete the
				129	indicated table. The delete case implicitly terminates all sessions
				130	with extreme prejudice, by freeing the specified clib mheap.
				131
				132	==== Creating a classifier session ====
				133
				134	To create a new classifier session via the control-plane API, send a
				135	"classify_add_del_session" message. The underlying action routine,
				136	vnet_classify_add_del_session(...), is located in
				137	.../vnet/vnet/classify/vnet_classify.c, and has the following
				138	prototype:
				139
				140	int vnet_classify_add_del_session (vnet_classify_main_t * cm,
				141	u32 table_index,
				142	u8 * match,
				143	u32 hit_next_index,
				144	u32 opaque_index,
				145	i32 advance,
				146	int is_add)
				147
				148	Pass cm = &vnet_classify_main if calling this routine directly. Table
				149	index specifies the table which receives the new session / contains
				150	the session to delete depending on is_add.
				151
				152	Match is the key for the indicated session. It need not be aligned,
				153	but it must be table->match_n_vectors*16 octets in length. As a
				154	courtesy, vnet_classify_add_del_session applies the table's mask to
				155	the stored key-value. In this way, one can create a session by passing
				156	unmasked (packet_data + offset) as the "match" parameter, and end up
				157	with unconfusing session keys.
				158
				159	Specific classifier client nodes - for example,
				160	.../vnet/vnet/classify/ip_classify.c - interpret the per-session
				161	hit_next_index parameter as a vpp graph-node next index. When packet
				162	classification produces a match, ip_classify_inline sends packets to
				163	the indicated disposition.
				164
				165	ip4/6_classify place the per-session opaque_index parameter into
				166	vnet_buffer(b)->l2_classify.opaque_index; a slight misnomer, but
				167	anyhow classifier applications can send session-hit packets to
				168	specific graph nodes, with useful values in buffer metadata. Depending
				169	on the required semantics, we send known-session traffic to a certain
				170	node, with e.g. a session pool index in buffer metadata. It's totally
				171	up to the control-plane and the specific use-case.
				172
				173	Finally, nodes such as ip4/6-classify apply the advance parameter as a
				174	[signed!] argument to vlib_buffer_advance(...); to "consume" a
				175	networking layer. Example: if we classify incoming tunneled IP packets
				176	by (inner) source/dest address and source/dest port, we might choose
				177	to decapsulate and reencapsulate the inner packet. In such a case,
				178	program the advance parameter to perform the tunnel decapsulation, and
				179	program next_index to send traffic to a node which uses
				180	e.g. opaque_index to output traffic on a specific tunnel interface.