Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 1 | .. _reassembly: |
| 2 | |
| 3 | IP Reassembly |
| 4 | ============= |
| 5 | |
| 6 | Some VPP functions need access to whole packet and/or stream |
| 7 | classification based on L4 headers. Reassembly functionality allows |
| 8 | both former and latter. |
| 9 | |
| 10 | Full reassembly vs shallow (virtual) reassembly |
| 11 | ----------------------------------------------- |
| 12 | |
| 13 | There are two kinds of reassembly available in VPP: |
| 14 | |
| 15 | 1. Full reassembly changes a stream of packet fragments into one |
| 16 | packet containing all data reassembled with fragment bits cleared |
| 17 | and fragment header stripped (in case of ip6). Note that resulting |
| 18 | packet may come out of reassembly as a buffer chain. Because it's |
| 19 | impractical to parse headers which are split over multiple vnet |
| 20 | buffers, vnet_buffer_chain_linearize() is called after reassembly so |
| 21 | that L2/L3/L4 headers can be found in first buffer. Full reassembly |
| 22 | is costly and shouldn't be used unless necessary. Full reassembly is by |
Dave Wallace | dac97e2 | 2022-05-24 21:25:55 -0400 | [diff] [blame] | 23 | default enabled for both ipv4 and ipv6 "for us" traffic |
Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 24 | - that is packets aimed at VPP addresses. This can be disabled via API |
Dave Wallace | dac97e2 | 2022-05-24 21:25:55 -0400 | [diff] [blame] | 25 | if desired, in which case "for us" fragments are dropped. |
Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 26 | |
| 27 | 2. Shallow (virtual) reassembly allows various classifying and/or |
| 28 | translating features to work with fragments without having to |
| 29 | understand fragmentation. It works by extracting L4 data and adding |
Dave Wallace | dac97e2 | 2022-05-24 21:25:55 -0400 | [diff] [blame] | 30 | them to vnet_buffer for each packet/fragment passing through SVR |
Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 31 | nodes. This operation is performed for both fragments and regular |
| 32 | packets, allowing consuming code to treat all packets in same way. SVR |
| 33 | caches incoming packet fragments (buffers) until first fragment is |
| 34 | seen. Then it extracts L4 data from that first fragment, fills it for |
| 35 | any cached fragments and transmits them in the same order as they were |
| 36 | received. From that point on, any other passing fragments get L4 data |
| 37 | populated in vnet_buffer based on reassembly context. |
| 38 | |
| 39 | Multi-worker behaviour |
| 40 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 41 | |
| 42 | Both reassembly types deal with fragments arriving on different workers |
| 43 | via handoff mechanism. All reassembly contexts are stored in pools. |
| 44 | Bihash mapping 5-tuple key to a value containing pool index and thread |
Dave Wallace | dac97e2 | 2022-05-24 21:25:55 -0400 | [diff] [blame] | 45 | index is used for lookups. When a lookup finds an existing reassembly on |
Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 46 | a different thread, it hands off the fragment to that thread. If lookup |
| 47 | fails, a new reassembly context is created and current worker becomes |
| 48 | owner of that context. Further fragments received on other worker |
| 49 | threads are then handed off owner worker thread. |
| 50 | |
| 51 | Full reassembly also remembers thread index where first fragment (as in |
| 52 | fragment with fragment offset 0) was seen and uses handoff mechanism to |
| 53 | send the reassembled packet out on that thread even if pool owner is |
| 54 | a different thread. This then requires an additional handoff to free |
| 55 | reassembly context as only pool owner can do that in a thread-safe way. |
| 56 | |
| 57 | Limits |
| 58 | ^^^^^^ |
| 59 | |
| 60 | Because reassembly could be an attack vector, there is a configurable |
| 61 | limit on the number of concurrent reassemblies and also maximum |
| 62 | fragments per packet. |
| 63 | |
| 64 | Custom applications |
| 65 | ^^^^^^^^^^^^^^^^^^^ |
| 66 | |
Dave Wallace | dac97e2 | 2022-05-24 21:25:55 -0400 | [diff] [blame] | 67 | Both reassembly features allow to be used by custom application which |
Klement Sekera | bb912f2 | 2022-01-25 17:32:38 +0000 | [diff] [blame] | 68 | are not part of VPP source tree. Be it patches or 3rd party plugins, |
| 69 | they can build their own graph paths by using "-custom*" versions of |
| 70 | nodes. Reassembly then reads next_index and error_next_index for each |
| 71 | buffer from vnet_buffer, allowing custom application to steer |
| 72 | both reassembled packets and any packets which are considered an error |
| 73 | in a way the custom application requires. |
| 74 | |
| 75 | Full reassembly |
| 76 | --------------- |
| 77 | |
| 78 | Configuration |
| 79 | ^^^^^^^^^^^^^ |
| 80 | |
| 81 | Configuration is via API (``ip_reassembly_enable_disable``) or CLI: |
| 82 | |
| 83 | ``set interface reassembly <interface-name> [on|off|ip4|ip6]`` |
| 84 | |
| 85 | here ``on`` means both ip4 and ip6. |
| 86 | |
| 87 | A show command is provided to see reassembly contexts: |
| 88 | |
| 89 | For ip4: |
| 90 | |
| 91 | ``show ip4-full-reassembly [details]`` |
| 92 | |
| 93 | For ip6: |
| 94 | |
| 95 | ``show ip6-full-reassembly [details]`` |
| 96 | |
| 97 | Global full reassembly parameters can be modified using API |
| 98 | ``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``. |
| 99 | |
| 100 | Defaults |
| 101 | """""""" |
| 102 | |
| 103 | For defaults values, see #defines in |
| 104 | |
| 105 | `ip4_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_full_reass.c>`_ |
| 106 | |
| 107 | ========================================= ========================================== |
| 108 | #define description |
| 109 | ----------------------------------------- ------------------------------------------ |
| 110 | IP4_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds |
| 111 | IP4_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions |
| 112 | IP4_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies |
| 113 | IP4_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly |
| 114 | ========================================= ========================================== |
| 115 | |
| 116 | and |
| 117 | |
| 118 | `ip6_full_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_full_reass.c>`_ |
| 119 | |
| 120 | ========================================= ========================================== |
| 121 | #define description |
| 122 | ----------------------------------------- ------------------------------------------ |
| 123 | IP6_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds |
| 124 | IP6_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions |
| 125 | IP6_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies |
| 126 | IP6_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly |
| 127 | ========================================= ========================================== |
| 128 | |
| 129 | Finished/expired contexts |
| 130 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 131 | |
| 132 | Reassembly contexts are freed either when reassembly is finished - when |
| 133 | all data has been received or in case of timeout. There is a process |
| 134 | walking all reassemblies, freeing any expired ones. |
| 135 | |
| 136 | Shallow (virtual) reassembly |
| 137 | ---------------------------- |
| 138 | |
| 139 | Configuration |
| 140 | ^^^^^^^^^^^^^ |
| 141 | |
| 142 | Configuration is via API (``ip_reassembly_enable_disable``) only as |
| 143 | there is no value in turning SVR on by hand without a feature consuming |
| 144 | buffer metadata. SVR is designed to be turned on by a feature requiring |
| 145 | it in a programmatic way. |
| 146 | |
| 147 | A show command is provided to see reassembly contexts: |
| 148 | |
| 149 | For ip4: |
| 150 | |
| 151 | ``show ip4-sv-reassembly [details]`` |
| 152 | |
| 153 | For ip6: |
| 154 | |
| 155 | ``show ip6-sv-reassembly [details]`` |
| 156 | |
| 157 | Global shallow reassembly parameters can be modified using API |
| 158 | ``ip_reassembly_set`` and retrieved using ``ip_reassembly_get``. |
| 159 | |
| 160 | Defaults |
| 161 | """""""" |
| 162 | |
| 163 | For defaults values, see #defines in |
| 164 | |
| 165 | `ip4_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip4_sv_reass.c>`_ |
| 166 | |
| 167 | ============================================ ========================================== |
| 168 | #define description |
| 169 | -------------------------------------------- ------------------------------------------ |
| 170 | IP4_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds |
| 171 | IP4_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions |
| 172 | IP4_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies |
| 173 | IP4_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly |
| 174 | ============================================ ========================================== |
| 175 | |
| 176 | and |
| 177 | |
| 178 | `ip6_sv_reass.c <__REPOSITORY_URL__/src/vnet/ip/reass/ip6_sv_reass.c>`_ |
| 179 | |
| 180 | ============================================ ========================================== |
| 181 | #define description |
| 182 | -------------------------------------------- ------------------------------------------ |
| 183 | IP6_SV_REASS_TIMEOUT_DEFAULT_MS timeout in milliseconds |
| 184 | IP6_SV_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS interval between reaping expired sessions |
| 185 | IP6_SV_REASS_MAX_REASSEMBLIES_DEFAULT maximum number of concurrent reassemblies |
| 186 | IP6_SV_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT maximum number of fragments per reassembly |
| 187 | ============================================ ========================================== |
| 188 | |
| 189 | Expiring contexts |
| 190 | ^^^^^^^^^^^^^^^^^ |
| 191 | |
| 192 | There is no way of knowing when a reassembly is finished without |
| 193 | performing (an almost) full reassembly, so contexts in SVR cannot be |
| 194 | freed in the same way as in full reassembly. Instead a different |
| 195 | approach is taken. Least recently used (LRU) list is maintained where |
| 196 | reassembly contexts are ordered based on last update. The oldest |
| 197 | context is then freed whenever SVR hits limit on number of concurrent |
| 198 | reassembly contexts. There is also a process reaping expired sessions |
| 199 | similar as in full reassembly. |
| 200 | |
| 201 | Truncated packets |
| 202 | ^^^^^^^^^^^^^^^^^ |
| 203 | |
| 204 | When SVR detects that a packet has been truncated in a way where L4 |
| 205 | headers are not available, it will mark it as such in vnet_buffer, |
| 206 | allowing downstream features to handle such packets as they deem fit. |
| 207 | |
| 208 | Fast path/slow path |
| 209 | ^^^^^^^^^^^^^^^^^^^ |
| 210 | |
| 211 | SVR runs is implemented fast path/slow path way. By default, it assumes |
| 212 | that any passing traffic doesn't contain fragments, processing buffers |
| 213 | in a dual-loop. If it sees a fragment, it then jumps to single-loop |
| 214 | processing. |
| 215 | |
| 216 | Feature enabled by other features/reference counting |
| 217 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 218 | |
| 219 | SVR feature is enabled by some other features, like NAT, when those |
| 220 | features are enabled. For this to work, it implements a reference |
| 221 | counted API for enabling/disabling SVR. |