Symptoms
Kubernetes certificates are outdated, the following is recorded to /var/log/messages
on K8s node:
May 27 19:16:08.017 journal: E0527 19:16:08.017520 1 authentication.go:63] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
and these errors showing inability to contact the Kubernetes apiserver:
May 30 09:36:11.947 kubelet: E0530 09:36:11.946614 21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://192.168.10.33:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused
May 30 09:36:11.966 kubelet: E0530 09:36:11.965934 21985 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.10.33:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dgdpr.hostedcloud.me&limit=500&resourceVersion=0: dial tcp 192.168.10.33:6443: getsockopt: connection refused
Kubernetes is not manageable by kubectl:
[root@osscore ~]# kubectl get pods
The connection to the server 192.168.10.12:6443 was refused - did you specify the right host or port?
The pods and services continue to operate until kubelet
service is restarted (or a server reboot).
If kubelet
service is restarted, it fails with the following messages in /var/log/messages
:
Jun 1 00:00:53.918 systemd: Started kubelet: The Kubernetes Node Agent.
Jun 1 00:00:53.981 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 1 00:00:53.983 kubelet: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 1 00:00:53.987 systemd: Started Kubernetes systemd probe.
Jun 1 00:00:53.990 kubelet: I0601 00:00:53.988308 10534 server.go:417] Version: v1.14.2
Jun 1 00:00:53.991 kubelet: I0601 00:00:53.988505 10534 plugins.go:103] No cloud provider specified.
Jun 1 00:00:53.991 kubelet: I0601 00:00:53.988530 10534 server.go:754] Client rotation is on, will bootstrap in background
Jun 1 00:00:53.991 kubelet: E0601 00:00:53.990520 10534 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2020-05-16 12:44:11 +0000 UTC
Jun 1 00:00:53.991 kubelet: F0601 00:00:53.990550 10534 server.go:265] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jun 1 00:00:53.995 systemd: kubelet.service: main process exited, code=exited, status=255/n/a
Jun 1 00:00:53.996 systemd: Unit kubelet.service entered failed state.
Jun 1 00:00:53.996 systemd: kubelet.service failed.
Cause
Kubernetes services certificates has expired.
Normally the certificates are refreshed for 1 year and are meant to get updated on Kubernetes cluster upgrades. It is recommended to perform the upgrade to a more recent version, once it is generally available.
Resolution
It is required to renew the certificates manually, PLEASE follow ALL the steps below accordingly.
And DO mind the NOTE(s).
NOTE: The steps were tested and verified against Kubernetes v.1.10! For higher versions corrections may be required, but the general algorithm is the same.
On Kubernetes master node:
Backup old certificates:
# mkdir -p /root/kube-backup/kubernetes-pki /root/kube-backup/kubernetes-conf /root/kube-backup/kubelet-pki # mv /etc/kubernetes/pki/* /root/kube-backup/kubernetes-pki/ # mv /etc/kubernetes/*.conf /root/kube-backup/kubernetes-conf/
Check your kubeadm version
# kubeadm version
Renew the certificates and kubeconfig files of the core services, based on your kubeadm version and whether your installation is behind a proxy or not:
For Kubeadm v1.15.x and below
# K8S_IP=$(kubectl config view -o jsonpath={.clusters[0].cluster.server} | cut -d/ -f3 | cut -d: -f1) # kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP # kubeadm alpha phase kubeconfig all --apiserver-advertise-address $K8S_IP
For Kubeadm v1.15.x and above, the command parameter "
--apiserver-advertise-address
" have been moved from alpha to init.# kubeadm init phase certs all --apiserver-advertise-address $K8S_IP # kubeadm init phase kubeconfig all --apiserver-advertise-address $K8S_IP
- For installations behind proxy, it should be passed as a variable before the kubeadm command:
or for kubeadm v1.15.x and above:# http_proxy=http://192.168.10.12:8008 https_proxy=http://192.168.10.12:8008 kubeadm alpha phase certs all --apiserver-advertise-address $K8S_IP
# K8S_IP=$(kubectl config view -o jsonpath={.clusters[0].cluster.server} | cut -d/ -f3 | cut -d: -f1) # http_proxy=http://192.168.10.12:8008 https_proxy=http://192.168.10.12:8008 kubeadm init phase certs all --apiserver-advertise-address $K8S_IP
Renew the config file to manage the cluster with kubectl:
# \cp -arf /etc/kubernetes/admin.conf $HOME/.kube/config # chown $(id -u):$(id -g) $HOME/.kube/config # chmod 777 $HOME/.kube/config
Renew kubelet certificates:
# systemctl stop kubelet # systemctl stop docker # mv /var/lib/kubelet/pki/* /root/kube-backup/kubelet-pki/ # systemctl start docker # systemctl start kubelet
For auxiliary system services, it is needed to recreate the service accounts to update tokens. Please follow below steps:
Do this for k8s cluster which is still using kube-dns:
# kubectl delete sa -n kube-system kube-dns kube-proxy kube-router tiller # kubectl create sa -n kube-system kube-dns # kubectl create sa -n kube-system kube-proxy # kubectl create sa -n kube-system kube-router # kubectl create sa -n kube-system tiller
Do this for k8s cluster which is using coredns:
# kubectl delete sa -n kube-system coredns kube-proxy kube-router tiller # kubectl create sa -n kube-system coredns # kubectl create sa -n kube-system kube-proxy # kubectl create sa -n kube-system kube-router # kubectl create sa -n kube-system tiller
Delete current pods of these services:
# kubectl delete pod -n kube-system \ $(kubectl get pod -n kube-system -l k8s-app=kube-proxy -o jsonpath="{.items[0].metadata.name}") \ $(kubectl get pod -n kube-system -l k8s-app=kube-dns -o jsonpath="{.items[*].metadata.name}") \ $(kubectl get pod -n kube-system -l k8s-app=kube-router -o jsonpath="{.items[0].metadata.name}") \ $(kubectl get pod -n kube-system -l app=helm -o jsonpath="{.items[0].metadata.name}")
NOTE: For step 6., in some cases instead of using kube-router, a Kubernetes (k8s) cluster can be using either (but not limited to) Flannel, Calico, Canal or Weave. This can be confirmed using ‘kubectl get pod -n kube-system’. Just replace the word kube-router respectively with what is installed on the affected k8s cluster when implementing said step.
On OSS Core node:
Remove the old kube cache:
# rm -rf /root/.kube/*
Populate the config file with the admin.conf content from Kubernetes node.
# cat > /root/.kube/config <<EOF ...paste the content here... EOF
Replace Kubernetes certificate used by CloudBlue Commerce:
# grep 'certificate-authority-data' /root/.kube/config | awk '{print $2}' | base64 -d > /usr/local/pem/kubernetes/certs/kubernetesApi.pem
(Skip both steps below for CBC Platform 20.4 & 20.5)
Get the new Tiller API auth bearer token:
# kubectl get secret -n kube-system $(kubectl get secrets -n kube-system | grep tiller | cut -f1 -d ' ') -o jsonpath={.data.token} | base64 -d
Log in to the CloudBlue Commerce PCP and paste this content into System > Settings > Kubernetes settings > Tiller API Authorization Bearer Token.
Restart the CloudBlue Commerce OSS Core services to propagate the new settings into runtime:
At this point the Kubernetes cluster should become completely manageable.