Pages

Monday, December 20, 2021

Kubernetes

It is an open source container orchestration framework

Developed by Google.

Helps you to manage contenarized application in different deployment environments 


What Kubernetes solve?

Container are perfect to host small applications like micro-services. Rise of technology now leads to 100 to thousands of containers 

Increased usage of containers in different environment is difficult to manage using scripts or some tools

That specific scenario caused to have a proper technology to maintain these containers


What feature orchestration tool offer?

High availability or no downtime of application

Scalability or high performance. Scale down or scale up application as per the user load. This provides more flexibility 

Disaster recovery - backup and restore. In case of break down of infrastructure there should be some mechanism to pick up the data, restore the data to the latest state.

So that application does not loose any data. Also, the containerised application can ran from the latest state after the recovery. 

All of above functionalities are offered by Kubernetes.


Kubernetes architecture

Kubernetes cluster is at least made of one master node and connected to couple of worker nodes.

Node can be a virtual or a physical machine

Each node has Kubelet process(node agent) running on it.

Kubelet is a kubernetes process that makes it possible for the cluster to communicate to each other and execute some tasks on those nodes like running some application process.

Each worker node has container of different applications deployed on it. So depending on how workload is distributed you will have different number of docker container running on worker nodes.

Worker nodes are the one where actual work is happening. On worker nodes your applications are running.


Master node runs several Kubernetes processes that are absolutely necessary to run manage the cluster properly

One of such process is an API server which also is a container. An API server is actually the entry point to the Kubernetes cluster so this is the process which the different Kubernetes clients will talk to

Like UI if you’re using Kubernetes dashboard an api if you’re using some scripts and automating technologies and command line tool.

So all of these will talk to API server. 


Another process that is running on master node is a controller manager which basically keeps an overview of what is happening in the cluster whether something needs to be repaired or maybe if a container died and it needs to be fixed.


Another process is Schedular basically responsible for scheduling containers on different nodes the workload and the available server resources on each nodes so its ’s an intelligent process that decides on which worker node the next container should be scheduled based on the available resources the worker nodes and the load that container needs 


And another very important component of whole cluster is an etcd key value storage which basically holds at any time the current state of the Kubernetes cluster so it has all the configuration data inside and all the status data of each node and each container inside of that node and the backup and restore that we mentioned earlier is actually made from these etcd snapshots because you can recover the whole cluster state using that etcd snapshot.


Another very important component which enables both master and worker nodes to talk to each other is virtual network that spans on all the nodes that are part of the cluster and in simple words virtual network actually turns all the nodes inside of the cluster into one powerful machine that has the sum of all the resources of individual nodes 


Actually worker nodes bear higher load as they are running applications on inside of it usually are much bigger and have more resources because they will be running hundreds of containers inside them whereas master node will be running just a handful of master processes like we  like we discussed earlier.


However master node is much more important than the individual nodes because for example if you lose a master node access you will not be able to access the cluster anymore and that means you should have backup of your master at any time so in production environments usually you would have at least two masters inside of your Kubernetes cluster but in more cases of course you’re going to have multiple masters where if one master node is down the cluster continues to function smoothly because you have other master available.


In summary:

API server: Entrypoint to K8s cluster

Controller Manager: Keeps track of what’s happening in the cluster

Scheduler: ensures Pods replacement

Etcd: Kubernetes backing store


Control Plane Nodes: handful of master process but of more important


Worker nodes: higher workload, much bigger and more resources


Main Kubernetes components:

Considering web app and data base as example.


Pod: 

smallest unit in Kubernetes

It is abstraction over container. Basically what pod does is it creates this running environment or a layer on top of the container and the reason is because Kubernetes wants to abstract away from the container runtime or container technologies so that you can replace them if you want to and also you don’t have to directly work with docker or whatever container technology you use in a Kubernetes so you can only interact with the Kubernetes layer so we have an application pod which is our own application and that will maybe use a database pod with its won container and this is also important concept here as pod is usually meant to run one application container inside it. You can also run multiple containers inside one pod but basically its only the case if you have one main application container and the helper container or some side service that has to run inside of that pod.Kubernetes offered out of the box virtual network which means that each pod gets its own IP address not the container.

And each pod can communicate with each other using that IP address which is an internal IP address not public so my application container can communicate with database using the IP address.


Summary:

Smallest unit in Kubernetes

Abstraction over container

Usually 1 application per pod

Each pod gets own its own IP address.

New IP address on re-creation


Service and Ingress:

In case of pod die/crash, new one will get created in its place and when that happens it will get assigned a new ip address which is obviously inconvenient if you are communicating with the database using the IP address because you have to adjust it every time pod restarts and  because of that another component of Kubernetes called service is used. So service is basically a static IP address or permanent IP address that can be attached so to say to each pod my app will have its own service and database pod will have its own service so even if pod dies the service and its IP address will stay so you don’t have to change that endpoint anymore.


Now to access the web app we have to create an external service. It is basically a service that opens the communication from external resources but again you wouldn’t want your database to be open to public requests and for that you would create something called internal service so this is a type of service that you specify when creating one 

There is another component of Kubernetes called ingress so instead of service the request first goes to ingress and it does the forwarding to service.


Summary:

Permanent ip address

Life cycle of Pod and service not connected


ConfigMap & Secret:

As we said pods communicate with each other using service so my application will have database endpoint say mongoldb service that it uses to communicate with the database but whether you configure usually this database url or endpoint in some application properties files or external environmental variable. Usually its inside of the built image of the application but problem is if endpoint service name or something change to mongoldb you would have to adjust that url in the application 

So usually you have to re-built the application with new version and have to push it to the repository and have to pull new image in pod and restart thing but this is tedious task for small change.

Like database url so for this purpose Kubernetes has a component called config map, it’s basically your external configuration to your application so configMap would usually contain configuration data like urls of a database or some other services that you use

And in Kubernetes you just connect it to the pod so that pod actually gets the data that config map contains.


It has another component called secret to store secret data credentials for example not in a plain text format but in base 64 in encoded format 

And to encrypt that there are tools deployed in Kubernetes that will make secrets secure.


Summary:

External configuration of your application

Configmap is for non-confidential data only

Secret is used to store secret data

Reference secret in deployment/pod. Use it as environmental variables or as a properties file.


Volume:

If database or the pod gets restarted the data would be gone and that’s problematic

So database data or log data to persisted reliably for long term we use component called volume.

So how it works that basically attaches a physical storage on a hard drive to your pod and that storage could be either on local machine meaning on the same server node where the pod is running or it could be on a remote storage ,meaning outside of the Kubernetes cluster can be in cloud storage or could be on-premise storage and just have reference to it.

Kubernetes cluster explicitly doesn’t manage any data persistence which means you as a Kubernetes user is responsible for backing up the data replicating and managing it and making sure it’s kept in a proper hardware.


Summary:

Storage on local machine

Or remote, outside of the K8s cluster


Deployment and StatefulSet:

What happens if my application pod die/crash? To avoid downtime we are replicating everything on multiple servers 

So we would have another node where a replica or clone of our application would run which will also be connected to the service.

As said earlier service is like persistent static IP address but also is a load balancer that will catch the request and forward to whichever part is least busy


In order to create second replica of application pod we don’t create second pod instead we define the blueprint for my application pod and specify how many replicas of that pod you would like to run 

And that component or that blueprint is called deployment. So in reality we will be creating deployment not pods. Deployment is like abstraction of Pods.


What if DB pod dies. We can’t replicate DB pod by deployment as they might be sharing different shared storage.

So this is done using StatefulSet. All data base applications like MySQL, elastic and mongoDB are created using StatefulSet.

This makes sure database read and write are synchronised. As it is difficult to maintain stateful service, most of the time DB are often hosted outside of Kubernetes cluster.


Summary:

Deployment is for stateless apps

StatefulSet is for stateFUL Apps or database


Main Kubernetes Components summarised:

Pod: abstraction of containers

Service: communication

Ingress: route traffic into cluster

ConfigMap and Secret for external communication

Volume for data persistence

Deployment and Stateful for replication.


Kubernetes Configuration:

All the configuration in Kubernetes goes through the master node with the process called api server

Kubernetes client would be ui or an api which could be a script or a curl command or cmd tool like kubectl. And these requests have to be either in yaml or json format.


Every configuration file has 3 parts.

  • Metadata
  • Specification
  • Status: Automatically generated and added by Kubernetes.

The way status works is that Kubernetes will always compare what is the desired state and what is the actual state. If no match then Kubernetes knows there’s something to be fixed and it will try to fix also knows as self-healing feature


Where does K8s get this status data?

From etcd. Etcd holds the current status of any K8s component


Format of the configuration file is yaml. Yaml is human friendly data serialisation standard for all programming languages

Store the config file with your code. It will be part of Infra or own git repository.


MiniKube and Kubectl:

In a production cluster setup 

  • we will have multiple worker nodes and master nodes
  • Separate virtual or physical machines representing each node

To test something locally on such setup is very tedious or almost impossible.


So Minikube is basically one node cluster where the master processes and the worker processes both run on the one node and this node will have a docker container runtime pre-installed so you will be able to run the containers or pods with container on this node.


Kubectl is a command line tool for Kubernetes cluster. Kubectl is most powerful of all 3 clients(UI,API,CLI)


Kubectl is used to communicate with any type of Kubernetes cluster not only Minikube cluster


Minikube can run either as a container or Virtual Machine on your laptop.

Minikube has docker pre-installed to run the containers in the cluster.

Driver means we are hosting Minikube as a container on our local machine. 


We have 2 layers of Docker:

  1. Minikube runs as a docker container
  2. Docker inside Minikube to run our application containers.

Kubectl CLI is for configuring the Minikube cluster


Minikube CLI is for start up/deleting the cluster.


For testing web applications and DB app


Create 4 K8s config yaml file.

  1. ConfigMAP: MongoDB endpoint
  2. Secret: MongoDB user & Pwd
  3. Deployment and service: MongoDB application with internal service
  4. Deployment and service: Our own WebAPP with external service

No comments:

Post a Comment