Flannel is another example of a dual CNI plugin design:
Connectivity is taken care of by the
flannel binary. This binary is a
metaplugin – a plugin that wraps other reference CNI plugins. In the simplest case, it generates a
bridge plugin configuration and “delegates” the connectivity setup to it.
Reachability is taken care of by the Daemonset running
flanneld. Here’s an approximate sequence of actions of what happens when the daemon starts:
ClusterCIDR. This information is saved in the
/run/flannel/subnet.envand is used by the flannel metaplugin to generate the
flannel.1and updates the Kubernetes Node object with its MAC address (along with its own Node IP).
This plugin assumes that daemons have a way to exchange information (e.g. VXLAN MAC). Previously, this required a separate database (hosted etcd) which was considered a big disadvantage. The new version of the plugin uses Kubernetes API to store that information in annotations of a Node API object.
The fully converged IP and MAC tables will look like this:
Assuming that the lab is already setup, flannel can be enabled with the following 3 commands:
Check that the flannel daemonset has reached the
$ kubectl -n kube-system get daemonset -l app=flannel NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-flannel-ds 3 3 3 3 3 <none> 90s
Now we need to “kick” all Pods to restart and pick up the new CNI plugin:
Here’s how the information from the diagram can be validated (using
worker2 as an example):
$ NODE=k8s-guide-worker2 make tshoot bash-5.0# ip route get 1.1 18.104.22.168 via 10.244.2.1 dev eth0 src 10.244.2.6 uid 0
$ docker exec -it k8s-guide-worker2 ip route default via 172.18.0.1 dev eth0 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1 172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
$ docker exec -it k8s-guide-worker2 ip neigh | grep PERM 10.244.1.0 dev flannel.1 lladdr ce:0a:4f:22:a4:2a PERMANENT 10.244.0.0 dev flannel.1 lladdr 5a:11:99:ab:8c:22 PERMANENT
$ docker exec -it k8s-guide-worker2 bridge fdb show dev flannel.1 5a:11:99:ab:8c:22 dst 172.18.0.3 self permanent ce:0a:4f:22:a4:2a dst 172.18.0.4 self permanent
Let’s track what happens when Pod-1 tries to talk to Pod-3.
We’ll assume that the ARP and MAC tables are converged and fully populated.
1. Pod-1 wants to send a packet to
10.244.0.2. Its network stack looks up the routing table to find the NextHop IP:
$ kubectl exec -it net-tshoot-4sg7g -- ip route get 10.244.0.2 10.244.0.2 via 10.244.1.1 dev eth0 src 10.244.1.6 uid 0
2. The packet reaches the
cbr0 bridge in the root network namespace, where the lookup is performed again:
$ docker exec -it k8s-guide-worker ip route get 10.244.0.2 10.244.0.2 via 10.244.0.0 dev flannel.1 src 10.244.1.0 uid 0
3. The NextHop and the outgoing interfaces are set, the ARP table lookup returns the static entry provisioned by the
$ docker exec -it k8s-guide-worker ip neigh get 10.244.0.0 dev flannel.1 10.244.0.0 dev flannel.1 lladdr 5a:11:99:ab:8c:22 PERMANENT
4. Next, the FDB of the VXLAN interface is consulted to find out the destination VTEP IP:
$ docker exec -it k8s-guide-worker bridge fdb | grep 5a:11:99:ab:8c:22 5a:11:99:ab:8c:22 dev flannel.1 dst 172.18.0.3 self permanent
5. The packet is VXLAN-encapsulated and sent to the
flannel.1 matches the VNI and the VXLAN MAC:
$ docker exec -it k8s-guide-control-plane ip link show flannel.1 3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 5a:11:99:ab:8c:22 brd ff:ff:ff:ff:ff:ff
6. The packet gets decapsulated and its original destination IP looked up in the main routing table:
$ docker exec -it k8s-guide-control-plane ip route get 10.244.0.2 10.244.0.2 dev cni0 src 10.244.0.1 uid 0
7. The ARP and bridge tables are then consulted to find the outgoing veth interface:
$ docker exec -it k8s-guide-control-plane ip neigh get 10.244.0.2 dev cni0 10.244.0.2 dev cni0 lladdr 7e:46:23:43:6f:ec REACHABLE $ docker exec -it k8s-guide-control-plane bridge fdb get 7e:46:23:43:6f:ec br cni0 7e:46:23:43:6f:ec dev vethaabf9eb2 master cni0
8. Finally, the packet arrives in the Pod-3’s network namespace where it gets processed by the local network stack:
$ kubectl exec -it net-tshoot-rkg46 -- ip route get 10.244.0.2 local 10.244.0.2 dev lo src 10.244.0.2 uid 0
Similar to kindnet
flanneld sets up the SNAT rules to enable egress connectivity for the Pods, the only difference is that it does this directly inside the
Chain POSTROUTING (policy ACCEPT 327 packets, 20536 bytes) pkts bytes target prot opt in out source destination 0 0 RETURN all -- * * 10.244.0.0/16 10.244.0.0/16 0 0 MASQUERADE all -- * * 10.244.0.0/16 !22.214.171.124/4 random-fully 0 0 RETURN all -- * * !10.244.0.0/16 10.244.0.0/24 0 0 MASQUERADE all -- * * !10.244.0.0/16 10.244.0.0/16 random-fully
direct routingmode, which acts by installing static routes for hosts on the same subnet.