Nathan Skrzypczak | 9ad39c0 | 2021-08-19 11:38:06 +0200 | [diff] [blame] | 1 | Contiv/VPP Kubernetes Network Plugin |
| 2 | ==================================== |
| 3 | |
| 4 | Overview |
| 5 | -------- |
| 6 | |
| 7 | Kubernetes is a container orchestration system that efficiently manages |
| 8 | Docker containers. The Docker containers and container platforms provide |
| 9 | many advantages over traditional virtualization. Container isolation is |
| 10 | done on the kernel level, which eliminates the need for a guest virtual |
| 11 | operating system, and therefore makes containers much more efficient, |
| 12 | faster, and lightweight. The containers in Contiv/VPP are referred to as |
| 13 | PODs. |
| 14 | |
| 15 | Contiv/VPP is a Kubernetes network plugin that uses `FD.io |
| 16 | VPP <https://fd.io/>`__ to provide network connectivity between PODs in |
| 17 | a k8s cluster (k8s is an abbreviated reference for kubernetes). It |
| 18 | deploys itself as a set of system PODs in the ``kube-system`` namespace, |
| 19 | some of them (``contiv-ksr``, ``contiv-etcd``) on the master node, and |
| 20 | some of them (``contiv-cni``, ``contiv-vswitch``, ``contiv-stn``) on |
| 21 | each node in the cluster. |
| 22 | |
| 23 | Contiv/VPP is fully integrated with k8s via its components, and it |
| 24 | automatically reprograms itself upon each change in the cluster via k8s |
| 25 | API. |
| 26 | |
| 27 | The main component of the `VPP <https://fd.io/technology/#vpp>`__ |
| 28 | solution, which runs within the ``contiv-vswitch`` POD on each node in |
| 29 | the cluster. The VPP solution also provides POD-to-POD connectivity |
| 30 | across the nodes in the cluster, as well as host-to-POD and |
| 31 | outside-to-POD connectivity. This solution also leverages VPP’s fast |
| 32 | data processing that runs completely in userspace, and uses |
| 33 | `DPDK <https://dpdk.org/>`__ for fast access to the network IO layer. |
| 34 | |
| 35 | Kubernetes services and policies are also a part of the VPP |
| 36 | configuration, which means they are fully supported on VPP, without the |
| 37 | need of forwarding packets into the Linux network stack (Kube Proxy), |
| 38 | which makes them very effective and scalable. |
| 39 | |
| 40 | Architecture |
| 41 | ------------ |
| 42 | |
| 43 | Contiv/VPP consists of several components, each of them packed and |
| 44 | shipped as a Docker container. Two of them deploy on Kubernetes master |
| 45 | node only: |
| 46 | |
| 47 | - `Contiv KSR <#contiv-ksr>`__ |
| 48 | - `Contiv ETCD <#contiv-etcd>`__ |
| 49 | |
| 50 | The rest of them deploy on all nodes within the k8s cluster (including |
| 51 | the master node): |
| 52 | |
| 53 | - `Contiv vSwitch <#contiv-vswitch>`__ |
| 54 | - `Contiv CNI <#contiv-cni>`__ |
| 55 | - `Contiv STN <#contiv-stn-daemon>`__ |
| 56 | |
| 57 | The following section briefly describes the individual Contiv |
| 58 | components, which are displayed as orange boxes on the picture below: |
| 59 | |
| 60 | .. figure:: ../../_images/contiv-arch.png |
| 61 | :alt: Contiv/VPP Architecture |
| 62 | |
| 63 | Contiv/VPP Architecture |
| 64 | |
| 65 | Contiv KSR |
| 66 | ~~~~~~~~~~ |
| 67 | |
| 68 | Contiv KSR (Kubernetes State Reflector) is an agent that subscribes to |
| 69 | k8s control plane, watches k8s resources and propagates all relevant |
| 70 | cluster-related information into the Contiv ETCD data store. Other |
| 71 | Contiv components do not access the k8s API directly, they subscribe to |
| 72 | Contiv ETCD instead. For more information on KSR, read the `KSR |
| 73 | Readme <https://github.com/contiv/vpp/blob/master/cmd/contiv-ksr/README.md>`__. |
| 74 | |
| 75 | Contiv ETCD |
| 76 | ~~~~~~~~~~~ |
| 77 | |
| 78 | Contiv/VPP uses its own instance of the ETCD database for storage of k8s |
| 79 | cluster-related data reflected by KSR, which are then accessed by Contiv |
| 80 | vSwitch Agents running on individual nodes. Apart from the data |
| 81 | reflected by KSR, ETCD also stores persisted VPP configuration of |
| 82 | individual vswitches (mainly used to restore the operation after |
| 83 | restarts), as well as some more internal metadata. |
| 84 | |
| 85 | Contiv vSwitch |
| 86 | ~~~~~~~~~~~~~~ |
| 87 | |
| 88 | vSwitch is the main networking component that provides the connectivity |
| 89 | to PODs. It deploys on each node in the cluster, and consists of two |
| 90 | main components packed into a single Docker container: VPP and Contiv |
| 91 | VPP Agent. |
| 92 | |
| 93 | **VPP** is the data plane software that provides the connectivity |
| 94 | between PODs, host Linux network stack, and data-plane NIC interface |
| 95 | controlled by VPP: |
| 96 | |
| 97 | - PODs are connected to VPP using TAP interfaces wired between VPP, and |
| 98 | each POD network namespace. |
| 99 | - host network stack is connected to VPP using another TAP interface |
| 100 | connected to the main (default) network namespace. |
| 101 | - data-plane NIC is controlled directly by VPP using DPDK. Note, this |
| 102 | means that this interface is not visible to the host Linux network |
| 103 | stack, and the node either needs another management interface for k8s |
| 104 | control plane communication, or [STN (Steal The |
| 105 | NIC)](SINGLE_NIC_SETUP.html) deployment must be applied. |
| 106 | |
| 107 | **Contiv VPP Agent** is the control plane part of the vSwitch container. |
| 108 | It is responsible for configuring the VPP according to the information |
| 109 | gained from ETCD, and requests from Contiv STN. It is based on the |
| 110 | `Ligato VPP Agent <https://github.com/ligato/vpp-agent>`__ code with |
| 111 | extensions that are related to k8s. |
| 112 | |
| 113 | For communication with VPP, it uses VPP binary API messages sent via |
| 114 | shared memory using `GoVPP <https://wiki.fd.io/view/GoVPP>`__. For |
| 115 | connection with Contiv STN, the agent acts as a GRPC server serving CNI |
| 116 | requests forwarded from the Contiv CNI. |
| 117 | |
| 118 | Contiv CNI |
| 119 | ~~~~~~~~~~ |
| 120 | |
| 121 | Contiv CNI (Container Network Interface) is a simple binary that |
| 122 | implements the `Container Network |
| 123 | Interface <https://github.com/containernetworking/cni>`__ API and is |
| 124 | being executed by Kubelet upon POD creation and deletion. The CNI binary |
| 125 | just packs the request into a GRPC request and forwards it to the Contiv |
| 126 | VPP Agent running on the same node, which then processes it |
| 127 | (wires/unwires the container) and replies with a response, which is then |
| 128 | forwarded back to Kubelet. |
| 129 | |
| 130 | Contiv STN Daemon |
| 131 | ~~~~~~~~~~~~~~~~~ |
| 132 | |
| 133 | This section discusses how the Contiv [STN (Steal The |
| 134 | NIC)](SINGLE_NIC_SETUP.html) daemon operation works. As already |
| 135 | mentioned, the default setup of Contiv/VPP requires two network |
| 136 | interfaces per node: one controlled by VPP for data facing PODs, and one |
| 137 | controlled by the host network stack for k8s control plane |
| 138 | communication. In case that your k8s nodes do not provide two network |
| 139 | interfaces, Contiv/VPP can work in the single NIC setup, when the |
| 140 | interface will be “stolen” from the host network stack just before |
| 141 | starting the VPP and configured with the same IP address on VPP, as well |
| 142 | as on the host-VPP interconnect TAP interface, as it had in the host |
| 143 | before it. For more information on STN setup, read the [Single NIC Setup |
| 144 | README](./SINGLE_NIC_SETUP.html) |