kindnet

Here is how kindnet satisfies the two main CNI plugin requirements:

  • Reachability is established by installing one static route per peer Node with NextHops pointing to the internal Node IPs. These routes get checked every 10 seconds to detect if there were any changes.
  • Connectivity is established by a mix of reference CNI plugins – ptp is used to create veth links, host-local to allocate IPs and portmap to configure port mappings. The configuration file gets generated by each of the kindnetd daemons on startup.

The diagram below shows how a fully converged routing table will look like:

Lab

This plugin is built into the Lab cluster by default, so the only thing required is to bring up the Lab environment

make setup && make up

Here’s how to validate and verify the above diagram in the Lab environment, using the second Node as an example:

  1. Pod IP and default route

Pod IP should have a /24 subnet mask (same as PodCIDR) and the default route pointing to the first IP of that subnet.

$ NODE=k8s-guide-worker2 make tshoot
bash-5.0# ip -br -4 add show eth0
[email protected]         UP             10.244.2.8/24 
bash-5.1# ip route
default via 10.244.2.1 dev eth0
10.244.2.0/24 via 10.244.2.1 dev eth0 src 10.244.2.8
10.244.2.1 dev eth0 scope link src 10.244.2.8

Note how the Pod routing is set up so that all the traffic, including the intra-subnet Pod-to-Pod communication, is sent over the same next-hop. This allows for all Pods to be interconnected via L3 without relying on a bridge or ARP for neighbor discovery.

  1. Node routing table

It should contain one /32 host-route per local Pod and one /24 per peer node.

docker exec -it k8s-guide-worker2 bash
[email protected]:/# ip route
default via 172.18.0.1 dev eth0 
10.244.0.0/24 via 172.18.0.10 dev eth0 
10.244.1.0/24 via 172.18.0.12 dev eth0 
10.244.2.2 dev vethf821f7f9 scope host 
10.244.2.3 dev veth87514986 scope host 
10.244.2.4 dev veth9829983c scope host 
10.244.2.5 dev veth010c83ae scope host 
10.244.2.8 dev vetha1079faf scope host 
  1. PodCIDR gateway

One notable thing is that the root namespace side of all veth links has the same IP address:

[email protected]s-guide-worker2:/# ip -br -4 addr show | grep veth
[email protected] UP             10.244.2.1/32 
[email protected] UP             10.244.2.1/32 
[email protected] UP             10.244.2.1/32 
[email protected] UP             10.244.2.1/32 
[email protected] UP             10.244.2.1/32 

They each act as the default gateway for their peer Pods and don’t have to be attached to a bridge.

A day in the life of a Packet

Let’s track what happens when Pod-1 tries to talk to Pod-3.

We’ll assume that the ARP and MAC tables are converged and fully populated.

  1. Pod-1 wants to send a packet to 10.244.0.5. Its network stack looks up the routing table to find the NextHop IP:
$ kubectl exec -it net-tshoot-wxgcw -- ip route get 10.244.0.5
10.244.0.5 via 10.244.1.1 dev eth0 src 10.244.1.3 uid 0 
  1. The packet is sent down the veth link and pops out in the root network namespace of the host, which repeats the lookup:
$ docker exec -it k8s-guide-worker ip route get 10.244.0.5
10.244.0.5 via 172.18.0.10 dev eth0 src 172.18.0.11 uid 0 
  1. The packet gets L2-switches by the kind bridge and enters the control-plane’s root network namespace:
docker exec -it k8s-guide-control-plane ip route get 10.244.0.5
10.244.0.5 dev veth9f517bf3 src 10.244.0.1 uid 0 
  1. Finally, the packet arrives in the Pod-3’s network namespace where it gets processed by the local network stack:
kubectl exec -it net-tshoot-x6wv9 -- ip route get 10.244.0.5
local 10.244.0.5 dev lo src 10.244.0.5 uid 0 

SNAT functionality

In addition to the main CNI functionality, kindnet also sets up a number of IP masquerade (Source NAT) rules. These rules allow Pods to access the same networks as the hosting Node (e.g. Internet). The new KIND-MASQ-AGENT chain is inserted into the NAT’s POSTROUTING chain and includes a special RETURN rule to exclude all traffic in the cluster-cidr range (10.244.0.0/16):

[email protected]:/# iptables -t nat -nvL | grep -B 4 -A 4 KIND-MASQ
Chain POSTROUTING (policy ACCEPT 3073 packets, 233K bytes)
 pkts bytes target     prot opt in     out     source               destination         
61703 4686K KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    0     0 DOCKER_POSTROUTING  all  --  *      *       0.0.0.0/0            172.18.0.1          
54462 4060K KIND-MASQ-AGENT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type !LOCAL /* kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain */

Chain KIND-MASQ-AGENT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
46558 3587K RETURN     all  --  *      *       0.0.0.0/0            10.244.0.0/16        /* kind-masq-agent: local traffic is not subject to MASQUERADE */
 7904  473K MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */

Caveats and Gotchas

  • Assumes all Nodes are in the same L2 domain.
  • Relies on host-local, ptp, portmap and loopback reference plugins.