Search Engine: Elastic

Article ID: 132717, created on Jun 1, 2018, last review on Dec 7, 2018

  • Applies to:
  • Operations Automation

Preface

Kubernetes cluster technology is used in Odin Automation without any additional configuration specific to the platform. On the contrary, OA makes use of the default feature set of Kubernetes and installs the applications as docker images using the common procedures. Therefore, the main knowledge source for Kubernetes internals is the original documentation portal:

Kubernetes: Production-Grade Container Orchestration

Network considerations

Kubernetes cluster is a complex structure that includes virtual networks, docker bridges, service ports, etc. A very detailed overview of the concepts can be found in the following pages:

Understanding kubernetes networking: pods

Understanding kubernetes networking: services

Kubernetes cluster in Odin Automation uses 2 default virtual networks:

  • 10.96.0.0/12 - for services, external communications
  • 10.244.0.0/16 - for pods, internal cross-cluster communications

(In case the networks overlap with some provider's infrastructure networks, they can be defined explicitly through --service-cidr and --pod-network-cidr options of uK8s.py script)

To allow network communication between Odin Automation Management Node and Kubernetes cluster, it is needed to add a custom route on OA MN:

10.96.0.0/12 via <backnet_k8s_node_ip>  

To pull application images, the GDPR server requires access to the internet either via a separate network interface (FrontNet) or via a proxy.

In addition, Kubernetes cluster in Odin Automation uses kube-router solution that manages KVS/IPVS routing and iptables rules:

kube-router

Firewall considerations

Modern RHEL7 installations are brought up with very strict firewall rules, where all network communications are disallowed by default. The following ultimate set of rules is suggested to be added to FW configuration to make Kubernetes cluster operable and allow communication with Odin Automation Management Node (also includes the port of GDPR application service 8081).

iptables -I INPUT -s 10.244.0.0/16 -j ACCEPT
iptables -I INPUT -d 10.244.0.0/16 -j ACCEPT
iptables -I INPUT -p tcp --dport 8081 -j ACCEPT
iptables -I INPUT -p tcp --dport 6443 -j ACCEPT
iptables -I INPUT -p tcp --dport 6308 -j ACCEPT
iptables -I INPUT -p tcp --dport 5432 -j ACCEPT
iptables -I INPUT -p udp --dport 53 -j ACCEPT
iptables -I OUTPUT  -s 10.244.0.0/16 -j ACCEPT
iptables -I OUTPUT  -d 10.244.0.0/16 -j ACCEPT
iptables -I OUTPUT -p tcp --dport 8081 -j ACCEPT
iptables -I OUTPUT -p tcp --dport 6308 -j ACCEPT
iptables -I OUTPUT -p tcp --dport 6443 -j ACCEPT
iptables -I OUTPUT -p tcp --dport 5432 -j ACCEPT
iptables -I OUTPUT -p udp --dport 53 -j ACCEPT
iptables -I FORWARD -s 10.244.0.0/16 -j ACCEPT
iptables -I FORWARD -d 10.244.0.0/16 -j ACCEPT
iptables -I FORWARD -p tcp --dport 8081 -j ACCEPT
iptables -I FORWARD -p tcp --dport 6443 -j ACCEPT
iptables -I FORWARD -p tcp --dport 6308 -j ACCEPT
iptables -I FORWARD -p tcp --dport 5432 -j ACCEPT
iptables -I FORWARD -p udp --dport 53 -j ACCEPT

Additional ports to be open that are not related to Kubernetes cluster, but are used for OA MN <-> slave communication, as for any other service node:

iptables -I INPUT -p tcp --dport 22 -j ACCEPT
iptables -I INPUT -p tcp --dports 8352:8500 -j ACCEPT

Note: do not forget to save all the rules to be persistent after node reboot.

Kubernetes services are also better managed through iptables, rather than firewalld service, so it is advised to switch the default firewall service to iptables before deployment. More details here: GDPR deployment fails: gdpr-backend not ready, no route to host.

Note that Kubernetes services add their own set of iptables rules to NAT/RAW/FILTER tables to allow the cluster networking operations.

An example of the full chain of iptables rules for an exposed service, added by the Kubernetes services automatically:

[root@k8s ~]# iptables-save | grep gdpr
-A KUBE-SEP-4RGWSELQTFCUHQFN -s 10.244.0.9/32 -m comment --comment "default/gdpr-backend:wildfly" -j KUBE-MARK-MASQ
-A KUBE-SEP-4RGWSELQTFCUHQFN -p tcp -m comment --comment "default/gdpr-backend:wildfly" -m recent --set --name KUBE-SEP-4RGWSELQTFCUHQFN --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.244.0.9:8081
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.105.191.143/32 -p tcp -m comment --comment "default/gdpr-backend:wildfly cluster IP" -m tcp --dport 8081 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.105.191.143/32 -p tcp -m comment --comment "default/gdpr-backend:wildfly cluster IP" -m tcp --dport 8081 -j KUBE-SVC-Q5WZFYA44U3ADWTQ
-A KUBE-SVC-Q5WZFYA44U3ADWTQ -m comment --comment "default/gdpr-backend:wildfly" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-4RGWSELQTFCUHQFN --mask 255.255.255.255 --rsource -j KUBE-SEP-4RGWSELQTFCUHQFN
-A KUBE-SVC-Q5WZFYA44U3ADWTQ -m comment --comment "default/gdpr-backend:wildfly" -j KUBE-SEP-4RGWSELQTFCUHQFN

DNS considerations

Kubernetes cluster exposes its own DNS service, always assigned the IP address of 10.96.0.10 that should be available to and used by Odin Automation Management Node to resolve the full names of Kubernetes microservices. The DNS server is added to /etc/resolv.conf as the first DNS host. It acts as a DNS proxy for resolving external domain names as well. In addition to a nameserver entry, a cluster search domain is appended, making the resulting configuration as:

search default.svc.cluster.local
nameserver 10.96.0.10

In case search domain is already set in the environment, the entry is appended to the list. So it is necessary that the first search domain actually resolved within the network settings configured on OSS Core and GDPR nodes. The kube-dns service deployed in scope of Kubernetes cluster uses the NS settings from GDPR node itself, so it is important to configure them properly per the backend network requirements.

Also it is important that K8s node is configured with the same DNS server settings as OSS Core node in order to proxy DNS requests successfully. Kube-dns uses the settings configured on the Kubernetes node itself to forward requests not destined to Kube services.

Time synchronization

Kubernetes cluster and docker containers may become inoperable in case of non-synchronized time, so it is highly advised to set up NTP (or any other time sync service) on the K8S node.

External configuration managers

A lot of infrastructure administrators use unified external configuration managers like Puppet to keep the network/DNS/firewall configuration consistent across all nodes. In case of Kubernetes cluster, since it's tightly bound to iptables and routing, it is important to make such configuration managers fully consistent with Kubernetes requirements.

It is advised to review the available solutions for integrating Kubernetes into configuration managers.

An example for Puppet:

Managing Kubernetes Configuration with Puppet

puppetlabs-kubernetes

Supported versions

Currently Odin Automation supports Kubernetes packages of 1.10.x branch. Higher versions are considered to be supported in the future. In order to avoid installing newer packages, YUM configuration requires to be edited, the instructions can be found in the following article:

Kubelet service does not start after kube* packages update from 1.10 to 1.11 or higher

Deployment

Kubernetes cluster is deployed automatically with a python script uK8s.py, the instructions are detailed at:

Setting up Kubernetes cluster

All operations are logged to /var/log/pa/k8s.install.log.

The deployment consists of:

  • disabling swap (docker requirement):

    # swapoff -a
    # sed -i '/ swap / s/^/#/' /etc/fstab
    
  • installation and enabling of the required packages: docker, kubelet, kubeadm, kubectl

  • initializing the cluster:

    # kubeadm init --pod-network-cidr=10.244.0.0/16
    # mkdir -p /root/.kube
    # cp -i /etc/kubernetes/admin.conf /root/.kube/config
    # chown root:root /root/.kube/config
    
  • initializing kube-router:

    # kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kubeadm-kuberouter.yaml
    

    At this point the installation waits for the complete cluster availability with the routine:

    # kubectl rollout status deployment/kube-dns -n kube-system
    

    In case of any issues bringing up all system pods, this stage may be stuck for a long time and fail eventually with:

    deployment "kube-dns" exceeded its progress deadline
    

    To move on, it is necessary to check the state of system pods and check for any issues in deployment:

    [root@k8s ~]# kubectl get pods -n kube-system -o wide
        NAME                                              READY     STATUS    RESTARTS   AGE       IP            NODE
        etcd-k8s.example.com                      1/1       Running   21         13d       10.39.45.45   k8s.example.com
        kube-apiserver-k8s.example.com            1/1       Running   21         13d       10.39.45.45   k8s.example.com
        kube-controller-manager-k8s.example.com   1/1       Running   26         13d       10.39.45.45   k8s.example.com
        kube-dns-86f4d74b45-8c9sk                         0/3       Running   42         13d       10.244.0.56   k8s.example.com
        kube-proxy-bxlx9                                  1/1       Running   14         13d       10.39.45.45   k8s.example.com
        kube-router-29mbx                                 0/1       Running   19         13d       10.39.45.45   k8s.example.com
        kube-scheduler-k8s.example.com            1/1       Running   24         13d       10.39.45.45   k8s.example.com
        tiller-deploy-f5597467b-m46rt                     1/1       Running   14         13d       10.244.0.57   k8s.example.com
    

    When you see incomplete READY containers in the corresponding colum, like 0/3 for kube-dns and 0/1 for kube-router above - it is a sign of problems setting them up.

    The following commands will help to understand the cause of deployment issues:

        # kubectl describe pod kube-proxy-bxlx9 -n kube-system
        # kubectl logs kube-proxy-bxlx9 -n kube-system
        # kubectl describe pod kube-dns-86f4d74b45-8c9sk -n kube-system
        # kubectl logs kube-dns-86f4d74b45-8c9sk -n kube-system    
    

    Common issues are lack of hardware resources and network/firewall restrictions.

  • adding Kubernetes YUM repository:

    [root@osscore ~]# cat /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    
  • installing kubectl on OA MN to manage the cluster
  • removing 'master' node taing from K8S node:

    # kubectl taint nodes --all node-role.kubernetes.io/master-
    

    Sometimes, due to some failure at this stage, the node was already untainted and re-run is not possible. Taint it back to proceed:

    # kubectl taint node k8s.example.com node-role.kubernetes.io/master=true:NoSchedule
    
  • initializing helm - docker images deployment helper for Kubernetes (required packages are preliminarily downloaded and unpacked):

    # /usr/local/bin/helm init 
    # kubectl create serviceaccount --namespace kube-system tiller
    # kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
    # kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller","automountServiceAccountToken":true}}}}'
    

    At this point the installer waits for tiller pod availability:

    # kubectl rollout status deployment/tiller-deploy -n kube-system
    

    In case of any issues, use the same approach as described for kube-router above to troubleshoot deployment.

  • DNS entries and custom routes are added (see considerations sections)

Helm: images deployment

Helm manager is used to deploy/update/remove docker images in Kubernetes cluster. An example of GDPR application deployment as docker image is at:

Installing GDPR Application

You can check the status of deployment with:

[root@osscore ~]# helm ls
NAME            REVISION        UPDATED                         STATUS          CHART                   NAMESPACE
gdpr-backend    1               Tue May 29 05:30:04 2018        DEPLOYED        gdpr-backend-1.0.441    default

To remove a release in order to re-run the deployment:

[root@osscore ~]# helm delete --purge gdpr-backend

After that, run helm install again with all the required parameters.

GDPR application usecase

Odin Automation GDPR application is deployed as a docker image into Kubernetes cluster. It's deployed as gdpr-backend pod:

[root@k8s ~]# kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
gdpr-backend-646bf9c4fb-qfhng   1/1       Running   3          3d

and exposed as gdpr-backend service available on port 8081:

[root@k8s ~]# kubectl get svc gdpr-backend
NAME           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
gdpr-backend   ClusterIP   10.96.88.228   <none>        8081/TCP   3d

It is a Java application deployed inside a Wildfly server inside the pod:

  • To access the pod environment:

    [root@k8s ~]# kubectl exec -ti gdpr-backend-646bf9c4fb-qfhng bash
    
  • To check the state of the process:

    [jboss@gdpr-backend-646bf9c4fb-qfhng ~]$ ps auxfwww
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    jboss      538  0.0  0.0  11780  1940 ?        Ss   02:35   0:00 bash
    jboss      560  0.0  0.0  47448  1640 ?        R+   02:36   0:00  \_ ps auxfwww
    jboss        1  0.0  0.0  11644  1496 ?        Ss   02:23   0:00 /bin/sh /opt/jboss/wildfly/bin/standalone.sh -c standalone-full.xml -b 0.0.0.0 -bmanagement 0.0.0.0 --debug 8787
    jboss      104 13.0 28.6 3175680 814484 ?      Sl   02:24   1:34 /usr/lib/jvm/java/bin/java -D[Standalone] -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Xms512m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/jboss/wildfly/standalone/log -Xloggc:/opt/jboss/wildfly/standalone/log/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:GCLogFileSize=1M -XX:NumberOfGCLogFiles=5 -agentlib:jdwp=transport=dt_socket,address=8787,server=y,suspend=n -Dorg.jboss.boot.log.file=/opt/jboss/wildfly/standalone/log/server.log -Dlogging.configuration=file:/opt/jboss/wildfly/standalone/configuration/logging.properties -jar /opt/jboss/wildfly/jboss-modules.jar -mp /opt/jboss/wildfly/modules org.jboss.as.standalone -Djboss.home.dir=/opt/jboss/wildfly -Djboss.server.base.dir=/opt/jboss/wildfly/standalone -c standalone-full.xml -b 0.0.0.0 -bmanagement 0.0.0.0
    
  • To check the logs:

    [jboss@gdpr-backend-646bf9c4fb-qfhng ~]$ ls -alh /opt/jboss/wildfly/standalone/log/
    total 168K
    drwxr-xr-x 1 jboss jboss   90 Jun  1 02:24 .
    drwxrwxr-x 1 jboss root    75 Apr  5 14:02 ..
    -rw-r--r-- 1 jboss jboss    0 Apr  5 14:02 audit.log
    -rw-r--r-- 1 jboss jboss 4.9K Jun  1 02:35 gc.log.0.current
    -rw-r--r-- 1 jboss jboss  50K Jun  1 02:24 server.log
    -rw-r--r-- 1 jboss jboss 105K Apr  5 14:05 server.log.2018-04-05
    

Kubernetes redeployment

Kubernetes: how to reinitialize cluster with GDPR application?

Known issues

Issue: Kubernetes services may stop working after deployment with the symptoms that network connections time out or cannot be routed.

Resolution: Kubernetes services add the whole set of iptables rules opon their startup. In case, after the services are started, iptables rules are flushed / modified by external managing systems, the cluster networking may stop operating. It is required to restart the services to bring them back to a working state. See GDPR feature does not work after deployment for this issue example in scope of GDPR application.

# systemctl stop kubelet
# systemctl stop docker
# systemctl stop iptables
# systemctl start iptables
# systemctl start docker
# systemctl start kubelet

Issue: Kubernetes cluster domain name resolution does not work, despite DNS server is added to /etc/resolv.conf and is reachable. Kubernetes cluster is configured inside Hypervisor VM(s).

Resolution: Disable IP/MAC filtering for the Virtual Machine. More details in the article:

GDPR deployment fails on generate.py: Network error: Host not found: gdpr-backend.default.svc.cluster.local..

Issue: Kubernetes cluster is not available from OA Management Node. OA MN and K8S servers are located in different internal subnets.

Resolution: Due to Kubernetes networking specifics, direct communication is a requirement. Possible options are to unite the networks or use VPN service to connect 2 ends through a peer-to-peer link. See the following article for more details:

Deploying GDRP Application failed: ip root add failed. Network is unreachable.

Issue: Kubernetes deployment cannot be completed with no default gateway on the K8s server.

Resolution: Kubernetes requires a default gateway to be configured to identify the network to bind to. The issue was reported to Kubernetes community as well.

More details here:

kube-router pod fails to deploy: No default routes

Issue: Default docker virtual network 172.17.0.0/16 overlaps with existing backnet network in the infrastructure.

Resolution: Docker provides a way to change the default network:

Kubernetes: how to change the default docker0 bridge network

5356b422f65bdad1c3e9edca5d74a1ae caea8340e2d186a540518d08602aa065 e12cea1d47a3125d335d68e6d4e15e07

Email subscription for changes to this article
Save as PDF