| AF_XDP device driver |
| ==================== |
| |
| This driver relies on Linux AF_XDP socket to rx/tx Ethernet packets. |
| |
| Maturity level |
| -------------- |
| |
| Under development: it should work, but has not been thoroughly tested. |
| |
| Features |
| -------- |
| |
| - copy and zero-copy mode |
| - multiqueue |
| - API |
| - custom eBPF program |
| - polling, interrupt and adaptive mode |
| |
| Known limitations |
| ----------------- |
| |
| MTU |
| ~~~ |
| |
| Because of AF_XDP restrictions, the MTU is limited to below PAGE_SIZE |
| (4096-bytes on most systems) minus 256-bytes, and they are additional |
| limitations depending upon specific Linux device drivers. As a rule of |
| thumb, a MTU of 3000-bytes or less should be safe. |
| |
| Number of buffers |
| ~~~~~~~~~~~~~~~~~ |
| |
| Furthermore, upon UMEM creation, the kernel allocates a |
| physically-contiguous structure, whose size is proportional to the |
| number of 4KB pages contained in the UMEM. That allocation might fail |
| when the number of buffers allocated by VPP is too high. That number can |
| be controlled with the ``buffers { buffers-per-numa }`` configuration |
| option. Finally, note that because of this limitation, this plugin is |
| unlikely to be compatible with the use of 1GB hugepages. |
| |
| Interrupt mode |
| ~~~~~~~~~~~~~~ |
| |
| Interrupt and adaptive mode are supported but is limited by default to |
| single threaded (no worker) configurations because of a kernel |
| limitation prior to 5.6. You can bypass the limitation at interface |
| creation time by adding the ``no-syscall-lock`` parameter, but you must |
| be sure that your kernel can support it, otherwise you will experience |
| double-frees. See |
| https://lore.kernel.org/bpf/BYAPR11MB365382C5DB1E5FCC53242609C1549@BYAPR11MB3653.namprd11.prod.outlook.com/ |
| for more details. |
| |
| Mellanox |
| ~~~~~~~~ |
| |
| When setting the number of queues on Mellanox NIC with ``ethtool -L``, |
| you must use twice the amount of configured queues: it looks like the |
| Linux driver will create separate RX queues and TX queues (but all |
| queues can be used for both RX and TX, the NIC will just not sent any |
| packet on “pure” TX queues. Confused? So I am.). For example if you set |
| ``combined 2`` you will effectively have to create 4 rx queues in AF_XDP |
| if you want to be sure to receive all packets. |
| |
| Requirements |
| ------------ |
| |
| This drivers supports Linux kernel 5.4 and later. Kernels older than 5.4 |
| are missing unaligned buffers support. |
| |
| The Linux kernel interface must be up and have enough queues before |
| creating the VPP AF_XDP interface, otherwise Linux will deny creating |
| the AF_XDP socket. The AF_XDP interface will claim NIC RX queue starting |
| from 0, up to the requested number of RX queues (only 1 by default). It |
| means all packets destined to NIC RX queue ``[0, num_rx_queues[`` will |
| be received by the AF_XDP interface, and only them. Depending on your |
| configuration, there will usually be several RX queues (typically 1 per |
| core) and packets are spread across queues by RSS. In order to receive |
| consistent traffic, you **must** program the NIC dispatching |
| accordingly. The simplest way to get all the packets is to specify |
| ``num-rx-queues all`` to grab all available queues or to reconfigure the |
| Linux kernel driver to use only ``num_rx_queues`` RX queues (i.e. all NIC |
| queues will be associated with the AF_XDP socket): |
| |
| :: |
| |
| ~# ethtool -L <iface> combined <num_rx_queues> |
| |
| Additionally, the VPP AF_XDP interface will use a MAC address generated |
| at creation time instead of the Linux kernel interface MAC. As Linux |
| kernel interface are not in promiscuous mode by default (see below) this |
| will results in a useless configuration where the VPP AF_XDP interface |
| only receives packets destined to the Linux kernel interface MAC just to |
| drop them because the destination MAC does not match VPP AF_XDP |
| interface MAC. If you want to use the Linux interface MAC for the VPP |
| AF_XDP interface, you can change it afterwards in VPP: |
| |
| :: |
| |
| ~# vppctl set int mac address <iface> <mac> |
| |
| Finally, if you wish to receive all packets and not only the packets |
| destined to the Linux kernel interface MAC you need to set the Linux |
| kernel interface in promiscuous mode: |
| |
| :: |
| |
| ~# ip link set dev <iface> promisc on |
| |
| Security considerations |
| ----------------------- |
| |
| When creating an AF_XDP interface, it will receive all packets arriving |
| to the NIC RX queue ``[0, num_rx_queues[``. You need to configure the |
| Linux kernel NIC driver properly to ensure that only intended packets |
| will arrive in this queue. There is no way to filter the packets |
| after-the-fact using e.g. netfilter or eBPF. |
| |
| Quickstart |
| ---------- |
| |
| 1. Put the Linux kernel interface up and in promiscuous mode: |
| |
| :: |
| |
| ~# ip l set dev enp216s0f0 promisc on up |
| |
| 2. Create the AF_XDP interface: |
| |
| :: |
| |
| ~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues all |
| |
| 3. Use the interface as usual, e.g.: |
| |
| :: |
| |
| ~# vppctl set int ip addr enp216s0f0/0 1.1.1.1/24 |
| ~# vppctl set int st enp216s0f0/0 up |
| ~# vppctl ping 1.1.1.100` |
| |
| Custom eBPF XDP program |
| ----------------------- |
| |
| This driver relies on libbpf and as such relies on the ``xsks_map`` eBPF |
| map. The default behavior is to use the XDP program already attached to |
| the interface if any, otherwise load the default one. You can request to |
| load a custom XDP program with the ``prog`` option when creating the |
| interface in VPP: |
| |
| :: |
| |
| ~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues 4 prog extras/bpf/af_xdp.bpf.o |
| |
| In that case it will replace any previously attached program. A custom |
| XDP program example is provided in ``extras/bpf/``. |
| |
| Performance consideration |
| ------------------------- |
| |
| AF_XDP relies on the Linux kernel NIC driver to rx/tx packets. To reach |
| high-performance (10’s MPPS), the Linux kernel NIC driver must support |
| zero-copy mode and its RX path must run on a dedicated core in the NUMA |
| where the NIC is physically connected. |