Services

“Service” is one of the most powerful and, as a result, complex abstractions in Kubernetes. It is, also, a very heavily overloaded term which makes it even more confusing for people approaching Kubernetes for the first time. This chapter will provide a high-level overview of different types of Services, their goals and how they relate to other cluster elements and APIs.

A lot of ideas and concepts in this chapter are based on numerous talks and presentations on this topic. It’s difficult to make concrete attributions, however most credit goes to members of Network Special Interest Group.

Services Hierarchy

A good starting point to understand a Kubernetes Service is to think of it as a distributed load-balancer. Similar to traditional load-balancers, its data model can be reduced to the following two components:

  1. Grouping of backend Pods – all Pods with similar labels represent a single service and can receive and process incoming traffic for that service.
  2. Methods of exposure – each group of Pods can be exposed either internally, to other Pods in a cluster, or externally, to end-users or external services in many different ways.

All Services implement the above functionality, but each in its own way designed for its unique use case. In order to understand various Service types, it helps to view them as an “hierarchy” – starting from the simplest, with each subsequent type building on top of the previous one. The table below is an attempt to explore and explain this hierarchy:

TypeDescription
HeadlessThe simplest form of load-balancing involving only DNS. Nothing is programmed in the data plane and no load-balancer VIP is assigned, however DNS query will return IPs for all backend Pods. The most typical use-case for this is stateful workloads (e.g. databases), where clients need stable and predictable DNS name and can handle the loss of connectivity and failover on their own.
ClusterIPThe most common type, assigns a unique ClusterIP (VIP) to a set of matching backend Pods. DNS lookup of a Service name returns the allocated ClusterIP. All ClusterIPs are configured in the data plane of each node as DNAT rules – destination ClusterIP is translated to one of the PodIPs. These NAT translations always happen on the egress (client-side) node which means that Node-to-Pod reachability must be provided externally (by a CNI plugin).
NodePortBuilds on top of the ClusterIP Service by allocating a unique static port in the root network namespace of each Node and mapping it (via Port Translation) to the port exposed by the backend Pods. The incoming traffic can hit any cluster Node and, as long as the destination port matches the NodePort, it will get forwarded to one of the healthy backend Pods.
LoadBalancerAttracts external user traffic to a Kubernetes cluster. Each LoadBalancer Service instance is assigned a unique, externally routable IP address which is advertised to the underlying physical network via BGP or gratuitous ARP. This Service type is implemented outside of the main kube controller – either by the underlying cloud as an external L4 load-balancer or with a cluster add-on like MetalLB, Porter or kube-vip.

One Service type that doesn’t fit with the rest is ExternalName. It instructs DNS cluster add-on (e.g. CoreDNS) to respond with a CNAME, redirecting all queries for this service’s domain name to an external FQDN, which can simplify interacting with external services (for more details see the Design Spec).

The following diagram illustrates how different Service types can be combined to expose a stateful application:

Although not directly connected, most Services rely on Deployments and StatefulSets to create the required number of Pods with a unique set of labels.

Service APIs and Implementation

Services have a relatively small and simple API. At the very least they expect the following to be defined:

  • Explicit list of backend ports that needs to be exposed.
  • Label selector to understand which Pods are potential upstream candidates.
  • A Service type which defaults to ClusterIP.
kind: Service
apiVersion: v1
metadata:
  name: service-example
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
      app: nginx
  type: LoadBalancer

Some services may not have any label selectors, in which case the list of backend Pods can still be constructed manually. This is often used to interconnect with services outside of the Kubernetes cluster while still relying on internal mechanisms of service discovery.

Service’s internal architecture consists of two loosely-coupled components:

  • Kubernetes control plane – a process running inside the kube-controller-manager binary, that reacts to API events and builds an internal representation of each service instance. This internal representation is a special Endpoints object that gets created for every Service instance and contains a list of healthy backend endpoints (PodIP + port).
  • Distributed data plane – a set of Node-local agents that read Endpoints objects and program their local data plane. This is most commonly implemented with kube-proxy with various competing implementations from 3rd-party Kubernetes networking providers like Cilium, Calico, kube-router and others.

Another less critical, but nonetheless important components is DNS. Internally, DNS add-on is just a Pod running in a cluster that caches Service and Endpoints objects and responds to incoming queries according to the DNS-Based Service Discovery specification, which defines the format for incoming queries and the expected structure for responses.