Symptoms

Kubernetes certificates are outdated, the following is recorded to /var/log/messages on K8s node:

May 27 19:16:08.017 journal: E0527 19:16:08.017520       1 authentication.go:63] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]

and these errors showing inability to contact the Kubernetes apiserver:

May 30 09:36:11.947 kubelet: E0530 09:36:11.946614   21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://192.168.10.33:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused
May 30 09:36:11.966 kubelet: E0530 09:36:11.965934   21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.10.33:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dgdpr.hostedcloud.me&limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused

Kubernetes is not manageable by kubectl:

[root@osscore ~]# kubectl get pods
The connection to the server 192.168.10.12:6443 was refused - did you specify the right host or port?

The pods and services continue to operate until kubelet service is restarted (or a server reboot).

If kubelet service is restarted, it fails with the following messages in /var/log/messages:

Jun  1 00:00:53.918 systemd: Started kubelet: The Kubernetes Node Agent.
Jun  1 00:00:53.981 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun  1 00:00:53.983 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun  1 00:00:53.987 systemd: Started Kubernetes systemd probe.
Jun  1 00:00:53.990 kubelet: I0601 00:00:53.988308   10534 server.go:417] Version: v1.14.2
Jun  1 00:00:53.991 kubelet: I0601 00:00:53.988505   10534 plugins.go:103] No cloud provider specified.
Jun  1 00:00:53.991 kubelet: I0601 00:00:53.988530   10534 server.go:754] Client rotation is on, will bootstrap in background
Jun  1 00:00:53.991 kubelet: E0601 00:00:53.990520   10534 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2020-05-16 12:44:11 +0000 UTC
Jun  1 00:00:53.991 kubelet: F0601 00:00:53.990550   10534 server.go:265] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jun  1 00:00:53.995 systemd: kubelet.service: main process exited, code=exited, status=255/n/a
Jun  1 00:00:53.996 systemd: Unit kubelet.service entered failed state.
Jun  1 00:00:53.996 systemd: kubelet.service failed.

Cause

Kubernetes services certificates has expired.

Normally the certificates are refreshed for 1 year and are meant to get updated on Kubernetes cluster upgrades. It is recommended to perform the upgrade to a more recent version, once it is generally available.


Resolution

It is required to renew the certificates manually, PLEASE follow ALL the steps below accordingly. 

And DO mind the NOTE(s).


NOTE: The steps were tested and verified against Kubernetes v.1.10! For higher versions corrections may be required, but the general algorithm is the same.


On Kubernetes master node:

  1. Backup old certificates:

    # mkdir -p /root/kube-backup/kubernetes-pki /root/kube-backup/kubernetes-conf /root/kube-backup/kubelet-pki
    # mv /etc/kubernetes/pki/* /root/kube-backup/kubernetes-pki/
    # mv /etc/kubernetes/*.conf /root/kube-backup/kubernetes-conf/
    
  2. Check your kubeadm version

    # kubeadm version
    
  3. Renew the certificates and kubeconfig files of the core services, based on your kubeadm version and whether your installation is behind a proxy or not:

    • For Kubeadm v1.15.x and below

      # K8S_IP=$(kubectl config view -o jsonpath={.clusters[0].cluster.server} | cut -d/ -f3 | cut -d: -f1)
      # kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP
      # kubeadm alpha phase kubeconfig all --apiserver-advertise-address $K8S_IP
      
    • For Kubeadm v1.15.x and above, the command parameter "--apiserver-advertise-address" have been moved from alpha to init.

      # kubeadm init phase certs all --apiserver-advertise-address $K8S_IP 
      # kubeadm init phase kubeconfig all --apiserver-advertise-address $K8S_IP
    • For installations behind proxy, it should be passed as a variable before the kubeadm command:
      # http_proxy=http://192.168.10.12:8008 https_proxy=http://192.168.10.12:8008 kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP
      or for kubeadm v1.15.x and above:
      # K8S_IP=$(kubectl config view -o jsonpath={.clusters[0].cluster.server} | cut -d/ -f3 | cut -d: -f1)
      # http_proxy=http://192.168.10.12:8008 https_proxy=http://192.168.10.12:8008 kubeadm init phase certs all --apiserver-advertise-address $K8S_IP
  4. Renew the config file to manage the cluster with kubectl:

    # \cp -arf /etc/kubernetes/admin.conf $HOME/.kube/config
    # chown $(id -u):$(id -g) $HOME/.kube/config
    # chmod 777 $HOME/.kube/config
    
  5. Renew kubelet certificates:

    # systemctl stop kubelet
    # systemctl stop docker
    # mv /var/lib/kubelet/pki/* /root/kube-backup/kubelet-pki/
    # systemctl start docker
    # systemctl start kubelet
    
  6. For auxiliary system services, it is needed to recreate the service accounts to update tokens. Please follow below steps: 


    Do this for k8s cluster which is still using kube-dns:

    # kubectl delete sa -n kube-system kube-dns kube-proxy kube-router tiller
    # kubectl create sa -n kube-system kube-dns
    # kubectl create sa -n kube-system kube-proxy
    # kubectl create sa -n kube-system kube-router
    # kubectl create sa -n kube-system tiller

     

    Do this for k8s cluster which is using coredns:

    # kubectl delete sa -n kube-system coredns kube-proxy kube-router tiller
    # kubectl create sa -n kube-system coredns
    # kubectl create sa -n kube-system kube-proxy
    # kubectl create sa -n kube-system kube-router
    # kubectl create sa -n kube-system tiller
     
  7. Delete current pods of these services:

    # kubectl delete pod -n kube-system \
    $(kubectl get pod -n kube-system -l k8s-app=kube-proxy -o jsonpath="{.items[0].metadata.name}") \
    $(kubectl get pod -n kube-system -l k8s-app=kube-dns -o jsonpath="{.items[*].metadata.name}") \
    $(kubectl get pod -n kube-system -l k8s-app=kube-router -o jsonpath="{.items[0].metadata.name}") \
    $(kubectl get pod -n kube-system -l app=helm -o jsonpath="{.items[0].metadata.name}")
    

     

    NOTEFor step 6.,  in some cases instead of using kube-router, a Kubernetes (k8s) cluster can be using either (but not limited to) Flannel, Calico, Canal or Weave. This can be confirmed using ‘kubectl get pod -n kube-system’. Just replace the word kube-router respectively with what is installed on the affected k8s cluster when implementing said step.   



On OSS Core node:

  1. Remove the old kube cache:

    # rm -rf /root/.kube/*
    
  2. Populate the config file with the admin.conf content from Kubernetes node.

    # cat > /root/.kube/config <<EOF
    ...paste the content here...
    EOF
    
  3. Replace Kubernetes certificate used by CloudBlue Commerce:

    # grep 'certificate-authority-data' /root/.kube/config | awk '{print $2}' | base64 -d > /usr/local/pem/kubernetes/certs/kubernetesApi.pem
    

(Skip both steps below for CBC Platform 20.4 & 20.5)

  1. Get the new Tiller API auth bearer token:

    # kubectl get secret -n kube-system $(kubectl get secrets -n kube-system | grep tiller | cut -f1 -d ' ') -o jsonpath={.data.token} | base64 -d
    

    Log in to the CloudBlue Commerce PCP and paste this content into System > Settings > Kubernetes settings > Tiller API Authorization Bearer Token

  2. Restart the CloudBlue Commerce OSS Core services to propagate the new settings into runtime:

    How to restart CloudBlue Commerce services


At this point the Kubernetes cluster should become completely manageable.


Internal content