src/vnet/classify/README - fdio/vpp - Gitiles

 === vnet classifier theory of operation ===

 The vnet classifier trades off simplicity and perf / scale
 characteristics. At a certain level, it's a dumb robot. Given an
 incoming packet, search an ordered list of (mask, match) tables. If
 the classifier finds a matching entry, take the indicated action. If
 not, take a last-resort action.

 We use the MMX-unit to match or hash 16 octets at a time. For hardware
 backward compatibility, the code does not [currently] use 256-bit
 (32-octet) vector instructions.

 Effective use of the classifier centers around building table lists
 which "hit" as soon as practicable. In many cases, established
 sessions hit in the first table. In this mode of operation, the
 classifier easily processes multiple MPPS / core - even with millions
 of sessions in the data base. Searching 357 tables on a regular basis
 will neatly solve the halting problem.

 ==== Basic operation ====

 The classifier mask-and-match operation proceeds as follows. Given a
 starting classifier table index, lay hands on the indicated mask
 vector.  When building tables, we arrange for the mask to obey
 mmx-unit (16-octet) alignment.

 We know that the first octet of packet data starts on a cache-line
 boundary. Further, it's reasonably likely that folks won't want to use
 the generalized classifier on the L2 header; preferring to decode the
 Ethertype manually. That scheme makes it easy to select among ip4 /
 ip6 / MPLS, etc. classifier table sets.

 A no-vlan-tag L2 header is 14 octets long. A typical ipv4 header
 begins with the octets 0x4500: version=4, header_length=5, DSCP=0,
 ECN=0. If one doesn't intend to classify on (DSCP, ECN) - the typical
 case - we program the classifier to skip the first 16-octet vector.

 To classify untagged ipv4 packets on source address, we program the
 classifier to skip one vector, and mask-and-match one vector.

 The basic match-and-match operation looks like this:

  switch (t->match_n_vectors)
    {
    case 1:
      result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
      break;

    case 2:
      result =  (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
      result |= (data[1 + t->skip_n_vectors] & mask[1]) ^ key[1];
      break;

      <etc>
     }

  result_mask = u32x4_zero_byte_mask (result);
  if (result_mask == 0xffff)
      return (v);

 Net of setup, it costs a couple of clock cycles to mask-and-match 16
 octets.

 At the risk of belaboring an obvious point, the control-plane
 '''must''' pay attention to detail. When skipping one (or more)
 vectors, masks and matches must reflect that decision. See
 .../vnet/vnet/classify/vnet_classify.c:unformat_classify_[mask|match]. Note
 that vec_validate (xxx, 13) creates a 14-element vector.

 ==== Creating a classifier table ====

 To create a new classifier table via the control-plane API, send a
 "classify_add_del_table" message. The underlying action routine,
 vnet_classify_add_del_table(...), is located in
 .../vnet/vnet/classify/vnet_classify.c, and has the following
 prototype:

  int vnet_classify_add_del_table (vnet_classify_main_t * cm,
                                   u8 * mask,
                                   u32 nbuckets,
                                   u32 memory_size,
                                   u32 skip,
                                   u32 match,
                                   u32 next_table_index,
                                   u32 miss_next_index,
                                   u32 * table_index,
                                   int is_add)

 Pass cm = &vnet_classify_main if calling this routine directly. Mask,
 skip(_n_vectors) and match(_n_vectors) are as described above. Mask
 need not be aligned, but it must be match*16 octets in length. To
 avoid having your head explode, be absolutely certain that '''only'''
 the bits you intend to match on are set.

 The classifier uses thread-safe, no-reader-locking-required
 bounded-index extensible hashing. Nbuckets is the [fixed] size of the
 hash bucket vector. The algorithm works in constant time regardless of
 hash collisions, but wastes space when the bucket array is too
 small. A good rule of thumb: let nbuckets = approximate number of
 entries expected.

 At a signficant cost in complexity, it would be possible to resize the
 bucket array dynamically. We have no plans to implement that function.

 Each classifier table has its own clib mheap memory allocation
 arena. To pick the memory_size parameter, note that each classifier
 table entry needs 16*(1 + match_n_vectors) bytes. Within reason, aim a
 bit high. Clib mheap memory uses o/s level virtual memory - not wired
 or hugetlb memory - so it's best not to scrimp on size.

 The "next_table_index" parameter is as described: the pool index in
 vnet_classify_main.tables of the next table to search. Code ~0 to
 indicate the end of the table list. 0 is a valid table index!

 We often create classification tables in reverse order -
 last-table-searched to first-table-searched - so we can easily set
 this parameter. Of course, one can manually adjust the data structure
 after-the-fact.

 Specific classifier client nodes - for example,
 .../vnet/vnet/classify/ip_classify.c - interpret the "miss_next_index"
 parameter as a vpp graph-node next index. When packet classification
 fails to produce a match, ip_classify_inline sends packets to the
 indicated disposition. A classifier application might program this
 parameter to send packets which don't match an existing session to a
 "first-sign-of-life, create-new-session" node.

 Finally, the is_add parameter indicates whether to add or delete the
 indicated table. The delete case implicitly terminates all sessions
 with extreme prejudice, by freeing the specified clib mheap.

 ==== Creating a classifier session ====

 To create a new classifier session via the control-plane API, send a
 "classify_add_del_session" message. The underlying action routine,
 vnet_classify_add_del_session(...), is located in
 .../vnet/vnet/classify/vnet_classify.c, and has the following
 prototype:

 int vnet_classify_add_del_session (vnet_classify_main_t * cm,
                                    u32 table_index,
                                    u8 * match,
                                    u32 hit_next_index,
                                    u32 opaque_index,
                                    i32 advance,
                                    int is_add)

 Pass cm = &vnet_classify_main if calling this routine directly. Table
 index specifies the table which receives the new session / contains
 the session to delete depending on is_add.

 Match is the key for the indicated session. It need not be aligned,
 but it must be table->match_n_vectors*16 octets in length. As a
 courtesy, vnet_classify_add_del_session applies the table's mask to
 the stored key-value. In this way, one can create a session by passing
 unmasked (packet_data + offset) as the "match" parameter, and end up
 with unconfusing session keys.

 Specific classifier client nodes - for example,
 .../vnet/vnet/classify/ip_classify.c - interpret the per-session
 hit_next_index parameter as a vpp graph-node next index. When packet
 classification produces a match, ip_classify_inline sends packets to
 the indicated disposition.

 ip4/6_classify place the per-session opaque_index parameter into
 vnet_buffer(b)->l2_classify.opaque_index; a slight misnomer, but
 anyhow classifier applications can send session-hit packets to
 specific graph nodes, with useful values in buffer metadata. Depending
 on the required semantics, we send known-session traffic to a certain
 node, with e.g. a session pool index in buffer metadata. It's totally
 up to the control-plane and the specific use-case.

 Finally, nodes such as ip4/6-classify apply the advance parameter as a
 [signed!] argument to vlib_buffer_advance(...); to "consume" a
 networking layer. Example: if we classify incoming tunneled IP packets
 by (inner) source/dest address and source/dest port, we might choose
 to decapsulate and reencapsulate the inner packet. In such a case,
 program the advance parameter to perform the tunnel decapsulation, and
 program next_index to send traffic to a node which uses
 e.g. opaque_index to output traffic on a specific tunnel interface.
	=== vnet classifier theory of operation ===

	The vnet classifier trades off simplicity and perf / scale
	characteristics. At a certain level, it's a dumb robot. Given an
	incoming packet, search an ordered list of (mask, match) tables. If
	the classifier finds a matching entry, take the indicated action. If
	not, take a last-resort action.

	We use the MMX-unit to match or hash 16 octets at a time. For hardware
	backward compatibility, the code does not [currently] use 256-bit
	(32-octet) vector instructions.

	Effective use of the classifier centers around building table lists
	which "hit" as soon as practicable. In many cases, established
	sessions hit in the first table. In this mode of operation, the
	classifier easily processes multiple MPPS / core - even with millions
	of sessions in the data base. Searching 357 tables on a regular basis
	will neatly solve the halting problem.

	==== Basic operation ====

	The classifier mask-and-match operation proceeds as follows. Given a
	starting classifier table index, lay hands on the indicated mask
	vector. When building tables, we arrange for the mask to obey
	mmx-unit (16-octet) alignment.

	We know that the first octet of packet data starts on a cache-line
	boundary. Further, it's reasonably likely that folks won't want to use
	the generalized classifier on the L2 header; preferring to decode the
	Ethertype manually. That scheme makes it easy to select among ip4 /
	ip6 / MPLS, etc. classifier table sets.

	A no-vlan-tag L2 header is 14 octets long. A typical ipv4 header
	begins with the octets 0x4500: version=4, header_length=5, DSCP=0,
	ECN=0. If one doesn't intend to classify on (DSCP, ECN) - the typical
	case - we program the classifier to skip the first 16-octet vector.

	To classify untagged ipv4 packets on source address, we program the
	classifier to skip one vector, and mask-and-match one vector.

	The basic match-and-match operation looks like this:

	switch (t->match_n_vectors)
	{
	case 1:
	result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
	break;

	case 2:
	result = (data[0 + t->skip_n_vectors] & mask[0]) ^ key[0];
	result \|= (data[1 + t->skip_n_vectors] & mask[1]) ^ key[1];
	break;

	<etc>
	}

	result_mask = u32x4_zero_byte_mask (result);
	if (result_mask == 0xffff)
	return (v);

	Net of setup, it costs a couple of clock cycles to mask-and-match 16
	octets.

	At the risk of belaboring an obvious point, the control-plane
	'''must''' pay attention to detail. When skipping one (or more)
	vectors, masks and matches must reflect that decision. See
	.../vnet/vnet/classify/vnet_classify.c:unformat_classify_[mask\|match]. Note
	that vec_validate (xxx, 13) creates a 14-element vector.

	==== Creating a classifier table ====

	To create a new classifier table via the control-plane API, send a
	"classify_add_del_table" message. The underlying action routine,
	vnet_classify_add_del_table(...), is located in
	.../vnet/vnet/classify/vnet_classify.c, and has the following
	prototype:

	int vnet_classify_add_del_table (vnet_classify_main_t * cm,
	u8 * mask,
	u32 nbuckets,
	u32 memory_size,
	u32 skip,
	u32 match,
	u32 next_table_index,
	u32 miss_next_index,
	u32 * table_index,
	int is_add)

	Pass cm = &vnet_classify_main if calling this routine directly. Mask,
	skip(_n_vectors) and match(_n_vectors) are as described above. Mask
	need not be aligned, but it must be match*16 octets in length. To
	avoid having your head explode, be absolutely certain that '''only'''
	the bits you intend to match on are set.

	The classifier uses thread-safe, no-reader-locking-required
	bounded-index extensible hashing. Nbuckets is the [fixed] size of the
	hash bucket vector. The algorithm works in constant time regardless of
	hash collisions, but wastes space when the bucket array is too
	small. A good rule of thumb: let nbuckets = approximate number of
	entries expected.

	At a signficant cost in complexity, it would be possible to resize the
	bucket array dynamically. We have no plans to implement that function.

	Each classifier table has its own clib mheap memory allocation
	arena. To pick the memory_size parameter, note that each classifier
	table entry needs 16*(1 + match_n_vectors) bytes. Within reason, aim a
	bit high. Clib mheap memory uses o/s level virtual memory - not wired
	or hugetlb memory - so it's best not to scrimp on size.

	The "next_table_index" parameter is as described: the pool index in
	vnet_classify_main.tables of the next table to search. Code ~0 to
	indicate the end of the table list. 0 is a valid table index!

	We often create classification tables in reverse order -
	last-table-searched to first-table-searched - so we can easily set
	this parameter. Of course, one can manually adjust the data structure
	after-the-fact.

	Specific classifier client nodes - for example,
	.../vnet/vnet/classify/ip_classify.c - interpret the "miss_next_index"
	parameter as a vpp graph-node next index. When packet classification
	fails to produce a match, ip_classify_inline sends packets to the
	indicated disposition. A classifier application might program this
	parameter to send packets which don't match an existing session to a
	"first-sign-of-life, create-new-session" node.

	Finally, the is_add parameter indicates whether to add or delete the
	indicated table. The delete case implicitly terminates all sessions
	with extreme prejudice, by freeing the specified clib mheap.

	==== Creating a classifier session ====

	To create a new classifier session via the control-plane API, send a
	"classify_add_del_session" message. The underlying action routine,
	vnet_classify_add_del_session(...), is located in
	.../vnet/vnet/classify/vnet_classify.c, and has the following
	prototype:

	int vnet_classify_add_del_session (vnet_classify_main_t * cm,
	u32 table_index,
	u8 * match,
	u32 hit_next_index,
	u32 opaque_index,
	i32 advance,
	int is_add)

	Pass cm = &vnet_classify_main if calling this routine directly. Table
	index specifies the table which receives the new session / contains
	the session to delete depending on is_add.

	Match is the key for the indicated session. It need not be aligned,
	but it must be table->match_n_vectors*16 octets in length. As a
	courtesy, vnet_classify_add_del_session applies the table's mask to
	the stored key-value. In this way, one can create a session by passing
	unmasked (packet_data + offset) as the "match" parameter, and end up
	with unconfusing session keys.

	Specific classifier client nodes - for example,
	.../vnet/vnet/classify/ip_classify.c - interpret the per-session
	hit_next_index parameter as a vpp graph-node next index. When packet
	classification produces a match, ip_classify_inline sends packets to
	the indicated disposition.

	ip4/6_classify place the per-session opaque_index parameter into
	vnet_buffer(b)->l2_classify.opaque_index; a slight misnomer, but
	anyhow classifier applications can send session-hit packets to
	specific graph nodes, with useful values in buffer metadata. Depending
	on the required semantics, we send known-session traffic to a certain
	node, with e.g. a session pool index in buffer metadata. It's totally
	up to the control-plane and the specific use-case.

	Finally, nodes such as ip4/6-classify apply the advance parameter as a
	[signed!] argument to vlib_buffer_advance(...); to "consume" a
	networking layer. Example: if we classify incoming tunneled IP packets
	by (inner) source/dest address and source/dest port, we might choose
	to decapsulate and reencapsulate the inner packet. In such a case,
	program the advance parameter to perform the tunnel decapsulation, and
	program next_index to send traffic to a node which uses
	e.g. opaque_index to output traffic on a specific tunnel interface.