Search Engine: Elastic

Article ID: 134072, created on May 30, 2019, last review on Jun 7, 2019

  • Applies to:
  • Operations Automation 8.0
  • Operations Automation 7.4

Symptoms

Kubernetes certificates are outdated, the following is recorded to /var/log/messages on K8s node:

May 27 19:16:08.017 journal: E0527 19:16:08.017520       1 authentication.go:63] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]

and these errors showing inability to contact the Kubernetes apiserver:

May 30 09:36:11.947 kubelet: E0530 09:36:11.946614   21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://192.168.10.33:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused
May 30 09:36:11.966 kubelet: E0530 09:36:11.965934   21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.10.33:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dgdpr.hostedcloud.me&limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused

Kubernetes is not manageable by kubectl:

[root@osscore ~]# kubectl get pods
The connection to the server 192.168.10.12:6443 was refused - did you specify the right host or port?

The pods and services continue to operate until kubelet service is restarted (or a server reboot).

If kubelet service is restarted, it fails with the following messages in /var/log/messages:

Jun  1 00:00:53.918 systemd: Started kubelet: The Kubernetes Node Agent.
Jun  1 00:00:53.981 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun  1 00:00:53.983 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun  1 00:00:53.987 systemd: Started Kubernetes systemd probe.
Jun  1 00:00:53.990 kubelet: I0601 00:00:53.988308   10534 server.go:417] Version: v1.14.2
Jun  1 00:00:53.991 kubelet: I0601 00:00:53.988505   10534 plugins.go:103] No cloud provider specified.
Jun  1 00:00:53.991 kubelet: I0601 00:00:53.988530   10534 server.go:754] Client rotation is on, will bootstrap in background
Jun  1 00:00:53.991 kubelet: E0601 00:00:53.990520   10534 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2020-05-16 12:44:11 +0000 UTC
Jun  1 00:00:53.991 kubelet: F0601 00:00:53.990550   10534 server.go:265] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jun  1 00:00:53.995 systemd: kubelet.service: main process exited, code=exited, status=255/n/a
Jun  1 00:00:53.996 systemd: Unit kubelet.service entered failed state.
Jun  1 00:00:53.996 systemd: kubelet.service failed.

Cause

Kubernetes services certificates got expired.

Normally the certificates are refreshed for 1 year and are meant to get updated on Kubernetes cluster upgrades. It is recommended to perform the upgrade to a more recent version, once it is generally available.

Resolution

It is required to renew the certificates manually, following the steps below.

NOTE: The steps were tested and verified against Kubernetes v.1.10! For higher versions corrections may be required, but the general algorithm is the same.

On Kubernetes master node:

  1. Backup old certificates:

    # mkdir -p /root/kube-backup/kubernetes-pki /root/kube-backup/kubernetes-conf /root/kube-backup/kubelet-pki
    # mv /etc/kubernetes/pki/* /root/kube-backup/kubernetes-pki/
    # mv /etc/kubernetes/*.conf /root/kube-backup/kubernetes-conf/
    
  2. Renew the certificates and kubeconfig files of the core services:

    # K8S_IP=$(kubectl config view -o jsonpath={.clusters[0].cluster.server} | cut -d/ -f3 | cut -d: -f1)
    # kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP
    # kubeadm alpha phase kubeconfig all --apiserver-advertise-address $K8S_IP
    

    For installations behind proxy, it should be passed as a variable behind the kubeadm command:

    # http_proxy=http://192.168.10.12:8008 https_proxy=http://192.168.10.12:8008 kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP
    
  3. Renew the config file to manage the cluster with kubectl:

    # \cp -arf /etc/kubernetes/admin.conf $HOME/.kube/config
    # chown $(id -u):$(id -g) $HOME/.kube/config
    # chmod 777 $HOME/.kube/config
    
  4. Renew kubelet certificates:

    # systemctl stop kubelet
    # systemctl stop docker
    # mv /var/lib/kubelet/pki/* /root/kube-backup/kubelet-pki/
    # systemctl start docker
    # systemctl start kubelet
    
  5. For auxiliary system services, it is needed to recreate the service accounts to update tokens:

    # kubectl delete sa -n kube-system kube-dns kube-proxy kube-router tiller
    # kubectl create sa -n kube-system kube-dns
    # kubectl create sa -n kube-system kube-proxy
    # kubectl create sa -n kube-system kube-router
    # kubectl create sa -n kube-system tiller
    
  6. Delete current pods of these services:

    # kubectl delete pod -n kube-system \
    $(kubectl get pod -n kube-system -l k8s-app=kube-proxy -o jsonpath="{.items[0].metadata.name}") \
    $(kubectl get pod -n kube-system -l k8s-app=kube-dns -o jsonpath="{.items[0].metadata.name}") \
    $(kubectl get pod -n kube-system -l k8s-app=kube-router -o jsonpath="{.items[0].metadata.name}") \
    $(kubectl get pod -n kube-system -l app=helm -o jsonpath="{.items[0].metadata.name}")
    
  7. Copy the content of /etc/kubernetes/admin.conf and login to the OSS Core node.

On OSS Core node:

  1. Remove the old kube cache:

    # rm -rf /root/.kube/*
    
  2. Populate the config file with the admin.conf content from Kubernetes node.

    # cat > /root/.kube/config <<EOF
    ...paste the content here...
    EOF
    
  3. Replace Kubernetes certificate used by Odin Automation:

    # grep 'certificate-authority-data' /root/.kube/config | awk '{print $2}' | base64 -d > /usr/local/pem/kubernetes/certs/kubernetesApi.pem
    
  4. Get the new Tiller API auth bearer token:

    # kubectl get secret -n kube-system $(kubectl get secrets -n kube-system | grep tiller | cut -f1 -d ' ') -o jsonpath={.data.token} | base64 -d
    

    Login to Odin Automation CP and paste this content into System > Settings > Kubernetes settings > Tiller API Authorization Bearer Token.

  5. Restart the Odin Automation OSS Core services to propagate the new settings into runtime:

    How to restart Odin Automation services

At this point the Kubernetes cluster should become completely manageable.

adc6deaa66054d8a194d131ba07f2785 8fc71f07abe5b233fea1ae0377cd5e3d 5356b422f65bdad1c3e9edca5d74a1ae aab95f5cf9bcfa920cc1dda8487f084a 55fe109b4b4fe3fbb893f22dbb85a41a 1941880841f714e458ae4dc3d9f3062d

Email subscription for changes to this article
Save as PDF