Reference

Edit This Page

Implementation details

FEATURE STATE: Kubernetes v1.10 stable
This feature is stable, meaning:

  • The version name is vX where X is an integer.
  • Stable versions of features will appear in released software for many subsequent versions.

kubeadm init and kubeadm join together provides a nice user experience for creating a best-practice but bare Kubernetes cluster from scratch. However, it might not be obvious how kubeadm does that.

This document provide additional details on what happen under the hood, with the aim of sharing knowledge on Kubernetes cluster best practices.

Core design principles

The cluster that kubeadm init and kubeadm join set up should be:

Constants and well-known values and paths

In order to reduce complexity and to simplify development of an on-top-of-kubeadm-implemented deployment solution, kubeadm uses a limited set of constants values for well know-known paths and file names.

The Kubernetes directory /etc/kubernetes is a constant in the application, since it is clearly the given path in a majority of cases, and the most intuitive location; other constants paths and file names are:

kubeadm init workflow internal design

The kubeadm init internal workflow consists of a sequence of atomic work tasks to perform, as described in kubeadm init.

The kubeadm init phase command allows users to invoke individually each task, and ultimately offers a reusable and composable API/toolbox that can be used by other Kubernetes bootstrap tools, by any IT automation tool or by advanced user for creating custom clusters.

Preflight checks

Kubeadm executes a set of preflight checks before starting the init, with the aim to verify preconditions and avoid common cluster startup problems. In any case the user can skip specific preflight checks (or eventually all preflight checks) with the --ignore-preflight-errors option.

Please note that:

  1. Preflight checks can be invoked individually with the kubeadm init phase preflight command

Generate the necessary certificates

Kubeadm generates certificate and private key pairs for different purposes:

Certificates are stored by default in /etc/kubernetes/pki, but this directory is configurable using the --cert-dir flag.

Please note that:

  1. If a given certificate and private key pair both exist, and its content is evaluated compliant with the above specs, the existing files will be used and the generation phase for the given certificate skipped. This means the user can, for example, copy an existing CA to /etc/kubernetes/pki/ca.{crt,key}, and then kubeadm will use those files for signing the rest of the certs. See also using custom certificates
  2. Only for the CA, it is possible to provide the ca.crt file but not the ca.key file, if all other certificates and kubeconfig files already are in place kubeadm recognize this condition and activates the ExternalCA , which also implies the csrsignercontroller in controller-manager won’t be started
  3. If kubeadm is running in ExternalCA mode; all the certificates must be provided by the user, because kubeadm cannot generate them by itself
  4. In case of kubeadm is executed in the --dry-run mode, certificates files are written in a temporary folder
  5. Certificate generation can be invoked individually with the kubeadm init phase certs all command

Generate kubeconfig files for control plane components

Kubeadm kubeconfig files with identities for control plane components:

Additionally, a kubeconfig file for kubeadm to use itself and the admin is generated and save into the /etc/kubernetes/admin.conf file. The “admin” here is defined the actual person(s) that is administering the cluster and want to have full control (root) over the cluster. The embedded client certificate for admin should: - Be in the system:masters organization, as defined by default RBAC user facing role bindings - Include a CN, but that can be anything. Kubeadm uses the kubernetes-admin CN

Please note that:

  1. ca.crt certificate is embedded in all the kubeconfig files.
  2. If a given kubeconfig file exists, and its content is evaluated compliant with the above specs, the existing file will be used and the generation phase for the given kubeconfig skipped
  3. If kubeadm is running in ExternalCA mode, all the required kubeconfig must be provided by the user as well, because kubeadm cannot generate any of them by itself
  4. In case of kubeadm is executed in the --dry-run mode, kubeconfig files are written in a temporary folder
  5. Kubeconfig files generation can be invoked individually with the kubeadm init phase kubeconfig all command

Generate static Pod manifests for control plane components

Kubeadm writes static Pod manifest files for control plane components to /etc/kubernetes/manifests; the kubelet watches this directory for Pods to create on startup.

Static Pod manifest share a set of common properties:

Please note that:

  1. All the images, for the --kubernetes-version/current architecture, will be pulled from k8s.gcr.io; In case an alternative image repository or CI image repository is specified this one will be used; In case a specific container image should be used for all control plane components, this one will be used. see using custom images for more details
  2. In case of kubeadm is executed in the --dry-run mode, static Pods files are written in a temporary folder
  3. Static Pod manifest generation for master components can be invoked individually with the kubeadm init phase control-plane all command

API server

The static Pod manifest for the API server is affected by following parameters provided by the users:

Other API server flags that are set unconditionally are:

Controller manager

The static Pod manifest for the API server is affected by following parameters provided by the users:

Other flags that are set unconditionally are:

Scheduler

The static Pod manifest for the scheduler is not affected by parameters provided by the users.

Generate static Pod manifest for local etcd

If the user specified an external etcd this step will be skipped, otherwise kubeadm generates a static Pod manifest file for creating a local etcd instance running in a Pod with following attributes:

Please note that:

  1. The etcd image will be pulled from k8s.gcr.io. In case an alternative image repository is specified this one will be used; In case an alternative image name is specified, this one will be used. see using custom images for more details
  2. in case of kubeadm is executed in the --dry-run mode, the etcd static Pod manifest is written in a temporary folder
  3. Static Pod manifest generation for local etcd can be invoked individually with the kubeadm init phase etcd local command

Optional Dynamic Kublet Configuration

To use this functionality call kubeadm alpha kubelet config enable-dynamic. It writes the kubelet init configuration into /var/lib/kubelet/config/init/kubelet file.

The init configuration is used for starting the kubelet on this specific node, providing an alternative for the kubelet drop-in file; such configuration will be replaced by the kubelet base configuration as described in following steps. See set Kubelet parameters via a config file for additional info.

Please note that:

  1. To make dynamic kubelet configuration work, flag --dynamic-config-dir=/var/lib/kubelet/config/dynamic should be specified in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
  2. The kubelet configuration can be changed by passing a KubeletConfiguration object to kubeadm init or kubeadm join by using a configuration file --config some-file.yaml. The KubeletConfiguration object can be separated from other objects such as InitConfiguration using the --- separator. For more details have a look at the kubeadm config print-default command.

Wait for the control plane to come up

This is a critical moment in time for kubeadm clusters. kubeadm waits until localhost:6443/healthz returns ok, however in order to detect deadlock conditions, kubeadm fails fast if localhost:10255/healthz (kubelet liveness) or localhost:10255/healthz/syncloop (kubelet readiness) don’t return ok, respectively after 40 and 60 second.

kubeadm relies on the kubelet to pull the control plane images and run them properly as static Pods. After the control plane is up, kubeadm completes a the tasks described in following paragraphs.

(optional and alpha in v1.9) Write base kubelet configuration

If kubeadm is invoked with --feature-gates=DynamicKubeletConfig:

  1. Write the kubelet base configuration into the kubelet-base-config-v1.9 ConfigMap in the kube-system namespace
  2. Creates RBAC rules for granting read access to that ConfigMap to all bootstrap tokens and all kubelet instances (that is system:bootstrappers:kubeadm:default-node-token and system:nodes groups)
  3. Enable the dynamic kubelet configuration feature for the initial control-plane node by pointing Node.spec.configSource to the newly-created ConfigMap

Save the kubeadm ClusterConfiguration in a ConfigMap for later reference

kubeadm saves the configuration passed to kubeadm init, either via flags or the config file, in a ConfigMap named kubeadm-config under kube-system namespace.

This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade) will be able to determine the actual/current cluster state and make new decisions based on that data.

Please note that:

  1. Before uploading, sensitive information like e.g. the token are stripped from the configuration
  2. Upload of master configuration can be invoked individually with the kubeadm init phase upload-config command
  3. If you initialized your cluster using kubeadm v1.7.x or lower, you must create manually the master configuration ConfigMap before kubeadm upgrade to v1.8 . In order to facilitate this task, the kubeadm config upload (from-flags|from-file) was implemented

Mark master

As soon as the control plane is available, kubeadm executes following actions:

Please note that:

  1. Mark control-plane phase phase can be invoked individually with the kubeadm init phase mark-control-plane command

Configure TLS-Bootstrapping for node joining

Kubeadm uses Authenticating with Bootstrap Tokens for joining new nodes to an existing cluster; for more details see also design proposal.

kubeadm init ensures that everything is properly configured for this process, and this includes following steps as well as setting API server and controller flags as already described in previous paragraphs. Please note that:

  1. TLS bootstrapping for nodes can be configured with the kubeadm init phase bootstrap-token
    command, executing all the configuration steps described in following paragraphs; alternatively, each step can be invoked individually

Create a bootstrap token

kubeadm init create a first bootstrap token, either generated automatically or provided by the user with the --token flag; as documented in bootstrap token specification, token should be saved as secrets with name bootstrap-token-<token-id> under kube-system namespace. Please note that:

  1. The default token created by kubeadm init will be used to validate temporary user during TLS bootstrap process; those users will be member of system:bootstrappers:kubeadm:default-node-token group
  2. The token has a limited validity, default 24 hours (the interval may be changed with the —token-ttl flag)
  3. Additional tokens can be created with the kubeadm token command, that provide as well other useful functions for token management

Allow joining nodes to call CSR API

Kubeadm ensure that users in system:bootstrappers:kubeadm:default-node-token group are able to access the certificate signing API.

This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap between the group above and the default RBAC role system:node-bootstrapper.

Setup auto approval for new bootstrap tokens

Kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the csrapprover controller.

This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap between the system:bootstrappers:kubeadm:default-node-token group and the default role system:certificates.k8s.io:certificatesigningrequests:nodeclient.

The role system:certificates.k8s.io:certificatesigningrequests:nodeclient should be created as well, granting POST permission to /apis/certificates.k8s.io/certificatesigningrequests/nodeclient.

Setup nodes certificate rotation with auto approval

Kubeadm ensures that certificate rotation is enabled for nodes, and that new certificate request for nodes will get its CSR request automatically approved by the csrapprover controller.

This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-certificate-rotation between the system:nodes group and the default role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient.

Create the public cluster-info ConfigMap

This phase creates the cluster-info ConfigMap in the kube-public namespace.

Additionally it is created a role and a RoleBinding granting access to the ConfigMap for unauthenticated users (i.e. users in RBAC group system:unauthenticated)

Please note that:

  1. The access to the cluster-info ConfigMap is not rate-limited. This may or may not be a problem if you expose your master to the internet; worst-case scenario here is a DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to serving the cluster-info ConfigMap.

Install addons

Kubeadm installs the internal DNS server and the kube-proxy addon components via the API server. Please note that:

  1. This phase can be invoked individually with the kubeadm init phase addon all command.

proxy

A ServiceAccount for kube-proxy is created in the kube-system namespace; then kube-proxy is deployed as a DaemonSet:

DNS

Note that:

A ServiceAccount for CoreDNS/kube-dns is created in the kube-system namespace.

Deploy the kube-dns Deployment and Service:

Optional self-hosting

To enable self hosting on a existing static Pod control-plane use kubeadm alpha selfhosting pivot.

Self hosting basically replaces static Pods for control plane components with DaemonSets; this is achieved by executing following procedure for API server, scheduler and controller manager static Pods:

Please note that self hosting is not yet resilient to node restarts; this can be fixed with external checkpointing or with kubelet checkpointing for the control plane Pods. See self-hosting for more details.

kubeadm join phases internal design

Similarly to kubeadm init, also kubeadm join internal workflow consists of a sequence of atomic work tasks to perform.

This is split into discovery (having the Node trust the Kubernetes Master) and TLS bootstrap (having the Kubernetes Master trust the Node).

see Authenticating with Bootstrap Tokens or the corresponding design proposal.

Preflight checks

kubeadm executes a set of preflight checks before starting the join, with the aim to verify preconditions and avoid common cluster startup problems.

Please note that:

  1. kubeadm join preflight checks are basically a subset kubeadm init preflight checks
  2. Starting from 1.9, kubeadm provides better support for CRI-generic functionality; in that case, docker specific controls are skipped or replaced by similar controls for crictl.
  3. Starting from 1.9, kubeadm provides support for joining nodes running on Windows; in that case, linux specific controls are skipped.
  4. In any case the user can skip specific preflight checks (or eventually all preflight checks) with the --ignore-preflight-errors option.

Discovery cluster-info

There are 2 main schemes for discovery. The first is to use a shared token along with the IP address of the API server. The second is to provide a file (that is a subset of the standard kubeconfig file).

Shared token discovery

If kubeadm join is invoked with --discovery-token, token discovery is used; in this case the node basically retrieves the cluster CA certificates from the cluster-info ConfigMap in the kube-public namespace.

In order to prevent “man in the middle” attacks, several steps are taken:

Please note that:

  1. Pub key validation can be skipped passing --discovery-token-unsafe-skip-ca-verification flag; This weakens the kubeadm security model since others can potentially impersonate the Kubernetes Master.

File/https discovery

If kubeadm join is invoked with --discovery-file, file discovery is used; this file can be a local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used to verify the connection.

With file discovery, the cluster CA certificates is provided into the file itself; in fact, the discovery file is a kubeconfig file with only server and certificate-authority-data attributes set, as described in kubeadm join reference doc; when the connection with the cluster is established, kubeadm try to access the cluster-info ConfigMap, and if available, uses it.

TLS Bootstrap

Once the cluster info are known, the file bootstrap-kubelet.conf is written, thus allowing kubelet to do TLS Bootstrapping (conversely until v.1.7 TLS bootstrapping were managed by kubeadm).

The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes Master to submit a certificate signing request (CSR) for a locally created key pair.

The request is then automatically approved and the operation completes saving ca.crt file and kubelet.conf file to be used by kubelet for joining the cluster, whilebootstrap-kubelet.conf is deleted.

Please note that:

(optional and alpha in v1.9) Write init kubelet configuration

If kubeadm is invoked with --feature-gates=DynamicKubeletConfig:

  1. Read the kubelet base configuration from the kubelet-base-config-v1.9 ConfigMap in the kube-system namespace using the Bootstrap Token credentials, and write it to disk as kubelet init configuration file /var/lib/kubelet/config/init/kubelet
  2. As soon as kubelet starts with the Node’s own credential (/etc/kubernetes/kubelet.conf), update current node configuration specifying that the source for the node/kubelet configuration is the above ConfigMap.

Please note that:

  1. To make dynamic kubelet configuration work, flag --dynamic-config-dir=/var/lib/kubelet/config/dynamic should be specified in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Feedback