blob: 49e0a8de6e6e128d2e7814f4c91a850ac1f3dabc [file] [log] [blame]
.. _reassembly:
IP Reassembly
=============
Some VPP functions need access to whole packet and/or stream
classification based on L4 headers. Reassembly functionality allows
both former and latter.
Full reassembly vs shallow (virtual) reassembly
-----------------------------------------------
There are two kinds of reassembly available in VPP:
1. Full reassembly changes a stream of packet fragments into one
packet containing all data reassembled with fragment bits cleared
and fragment header stripped (in case of ip6). Note that resulting
packet may come out of reassembly as a buffer chain. Because it's
impractical to parse headers which are split over multiple vnet
buffers, vnet_buffer_chain_linearize() is called after reassembly so
that L2/L3/L4 headers can be found in first buffer. Full reassembly
is costly and shouldn't be used unless necessary. Full reassembly is by
default enabled for both ipv4 and ipv6 "for us" traffic
- that is packets aimed at VPP addresses. This can be disabled via API
if desired, in which case "for us" fragments are dropped.
2. Shallow (virtual) reassembly allows various classifying and/or
translating features to work with fragments without having to
understand fragmentation. It works by extracting L4 data and adding
them to vnet_buffer for each packet/fragment passing through SVR
nodes. This operation is performed for both fragments and regular
packets, allowing consuming code to treat all packets in same way. SVR
caches incoming packet fragments (buffers) until first fragment is
seen. Then it extracts L4 data from that first fragment, fills it for
any cached fragments and transmits them in the same order as they were
received. From that point on, any other passing fragments get L4 data
populated in vnet_buffer based on reassembly context.
Multi-worker behaviour
^^^^^^^^^^^^^^^^^^^^^^
Both reassembly types deal with fragments arriving on different workers
via handoff mechanism. All reassembly contexts are stored in pools.
Bihash mapping 5-tuple key to a value containing pool index and thread
index is used for lookups. When a lookup finds an existing reassembly on
a different thread, it hands off the fragment to that thread. If lookup
fails, a new reassembly context is created and current worker becomes
owner of that context. Further fragments received on other worker
threads are then handed off owner worker thread.
Full reassembly also remembers thread index where first fragment (as in
fragment with fragment offset 0) was seen and uses handoff mechanism to
send the reassembled packet out on that thread even if pool owner is
a different thread. This then requires an additional handoff to free
reassembly context as only pool owner can do that in a thread-safe way.
Limits
^^^^^^
Because reassembly could be an attack vector, there is a configurable
limit on the number of concurrent reassemblies and also maximum
fragments per packet.
Custom applications
^^^^^^^^^^^^^^^^^^^
Both reassembly features allow to be used by custom application which
are not part of VPP source tree. Be it patches or 3rd party plugins,
they can build their own graph paths by using "-custom*" versions of
nodes. Reassembly then reads next_index and error_next_index for each
buffer from vnet_buffer, allowing custom application to steer
both reassembled packets and any packets which are considered an error
in a way the custom application requires.
Full reassembly
---------------
Configuration
^^^^^^^^^^^^^
Configuration is via API (``ip_reassembly_enable_disable``) or CLI:
``set interface reassembly <interface-name> [on|off|ip4|ip6]``
here ``on`` means both ip4 and ip6.
A show command is provided to see reassembly contexts:
For ip4:
``show ip4-full-reassembly [details]``
For ip6:
``show ip6-full-reassembly [details]``
Global full reassembly parameters can be modified using API
``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``.
Defaults
""""""""
For defaults values, see #defines in
`ip4_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_full_reass.c>`_
========================================= ==========================================
#define description
----------------------------------------- ------------------------------------------
IP4_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds
IP4_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions
IP4_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies
IP4_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly
========================================= ==========================================
and
`ip6_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_full_reass.c>`_
========================================= ==========================================
#define description
----------------------------------------- ------------------------------------------
IP6_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds
IP6_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions
IP6_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies
IP6_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly
========================================= ==========================================
Finished/expired contexts
^^^^^^^^^^^^^^^^^^^^^^^^^
Reassembly contexts are freed either when reassembly is finished - when
all data has been received or in case of timeout. There is a process
walking all reassemblies, freeing any expired ones.
Shallow (virtual) reassembly
----------------------------
Configuration
^^^^^^^^^^^^^
Configuration is via API (``ip_reassembly_enable_disable``) only as
there is no value in turning SVR on by hand without a feature consuming
buffer metadata. SVR is designed to be turned on by a feature requiring
it in a programmatic way.
A show command is provided to see reassembly contexts:
For ip4:
``show ip4-sv-reassembly [details]``
For ip6:
``show ip6-sv-reassembly [details]``
Global shallow reassembly parameters can be modified using API
``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``.
Defaults
""""""""
For defaults values, see #defines in
`ip4_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_sv_reass.c>`_
============================================ ==========================================
#define description
-------------------------------------------- ------------------------------------------
IP4_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds
IP4_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions
IP4_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies
IP4_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly
============================================ ==========================================
and
`ip6_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_sv_reass.c>`_
============================================ ==========================================
#define description
-------------------------------------------- ------------------------------------------
IP6_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds
IP6_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions
IP6_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies
IP6_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly
============================================ ==========================================
Expiring contexts
^^^^^^^^^^^^^^^^^
There is no way of knowing when a reassembly is finished without
performing (an almost) full reassembly, so contexts in SVR cannot be
freed in the same way as in full reassembly. Instead a different
approach is taken. Least recently used (LRU) list is maintained where
reassembly contexts are ordered based on last update. The oldest
context is then freed whenever SVR hits limit on number of concurrent
reassembly contexts. There is also a process reaping expired sessions
similar as in full reassembly.
Truncated packets
^^^^^^^^^^^^^^^^^
When SVR detects that a packet has been truncated in a way where L4
headers are not available, it will mark it as such in vnet_buffer,
allowing downstream features to handle such packets as they deem fit.
Fast path/slow path
^^^^^^^^^^^^^^^^^^^
SVR runs is implemented fast path/slow path way. By default, it assumes
that any passing traffic doesn't contain fragments, processing buffers
in a dual-loop. If it sees a fragment, it then jumps to single-loop
processing.
Feature enabled by other features/reference counting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SVR feature is enabled by some other features, like NAT, when those
features are enabled. For this to work, it implements a reference
counted API for enabling/disabling SVR.