Blog

Elasticsearch Meets Kubernetes, Part 1 of 2

Intro

At Solinea, our core business is developing systems that help organizations deploy and manage their containerized applications on Kubernetes.  We use Elasticsearch as our go to solution for collecting and analyzing logs which gives our customers visibility into their vital system operations. With its ability to easily spin up and scale clusters, Kubernetes provides an ideal platform for running Elasticsearch in the cloud. 

When I first started experimenting with Elasticsearch on Kubernetes, I noticed that the Docker container image created for K8 – in the pires/docker-elasticsearch-kubernetes Github repo – did not expose some key Elasticsearch configuration variables, specifically: number of primary shards, number of replicas, and minimum number of master nodes. To address this need, I updated the docker-elasticsearch-kubernetes configuration files for our own installations then contributed the files back to the project. 

I’ve written a two blog series that will discuss how to set up an Elasticsearch cluster that takes advantage of all the docker-elasticsearch-kubernetes configuration options. In part 1 I’ll show you how to build an Elasticsearch container image and set up the front end service.  In part 2 I’ll continue with building out the Elasticsearch cluster and write some scripts to start, test, and stop the cluster.

Elasticsearch Kubernetes Cluster

The cluster we will build consists of separate client, data, and master nodes which provides better Elasticsearch performance than clusters that combine these functions in each node. There will be 2 client nodes, 4 data nodes, and 3 master nodes.

website-elasticsearch-on-kubernetes_graphic

The master nodes are critical to keeping the Elasticsearch cluster running so they will be configured for high availability.  One master node will be the primary at any time while the other two can take on the primary master role if the primary dies. Split brain syndrome is avoided when the previous master comes back online by configuring each master node to become master eligible only when there are two other master eligible nodes.  More on this later.

 

Build a Kubernetes Ready Elasticsearch Container Image

Elasticsearch Container Image Repo

The first step is creating a Kubernetes ready Elasticsearch Docker container, which you can get from the pires/docker-elasticsearch-kubernetes repo on Github.  The Dockerfile for this container installs all the necessary components for Elasticsearch including Elasticsearch itself and the io.fabric8/elasticsearch-cloud-kubernetes plugin that enables the Elasticsearch nodes to discover each other without having to specify the IP addresses of the nodes in the elasticsearch.yml configuration file.  Get a copy of the pires/docker-elasticsearch-kubernetes repo then build your Elasticsearch image by running this command in the docker-elasticsearch-kubernetes directory:

 

This example assumes you have a Google Container Engine account that uses the default container registry at gcr.io and a GKE project called my_gke_project.  The Elasticsearch image label has been arbitrarily set to elasticsearch:latest.  You can, of course, use different settings that better suits your environment.

After building your Elasticsearch image, push it to your GKE account: 

 

Elasticsearch Container Environment Variables

If you take a look at the elasticsearch.yml configuration file, included below, you will see a number of environment variables that enable you to customize your Elasticsearch cluster.


Checking the Dockerfile you can see that the pires/docker-elasticsearch-kubernetes repo is derived from pires/docker-elasticsearch. The Dockerfile in that repo sets the Elasticsearch cluster defaults to create a cluster consisting of a single node. Our cluster example, however, will have separate client, data, and master nodes as I pointed out earlier.

  • ${NODE_MASTER} – If set to true the node is eligible to elected as a master node. Master nodes control the cluster.
  • ${NODE_DATA} – If set to true the node will be a data node. Data nodes store data and perform data operations such as CRUD, search, and aggregations.
    • If both ${NODE_MASTER} and ${NODE_DATA} are set to true the node will act as a data node and is eligible to become a master node.
    • If  both ${NODE_MASTER} and ${NODE_DATA} are set to false the node will be a dedicated client. Client nodes are essentially Elasticsearch command routers, forwarding cluster level requests to master nodes and data-related requests, such as search, to data nodes
  • ${NUMBER_OF_SHARDS} – Set to the desired number of primary shards, usually 1 for every data node.
  • ${NUMBER_OF_REPLICAS} – Set the desired number of replica node sets. The total number of shards on your cluster is determined by this expression:
     
    Total Shards = ${NUMBER_OF_SHARDS}+${NUMBER_OF_SHARDS}*${NUMBER_OF_REPLICAS}

    Note that ${NUMBER_OF_SHARDS} and ${NUMBER_OF_SHARDS} are relevant only for data nodes, not master or client only nodes since they do not index data.
  • ${NUMBER_OF_MASTERS} – Sets the minimum number of master nodes that must be present in a cluster to for a master eligible mode to be elected master. Note this setting is only relevant for master eligible nodes. Data and client only nodes are not affected by this setting since they cannot become master nodes.
  • ${ES_HEAP_SIZE} – This variable is not exposed in the elasticsearch.yml file, instead it is baked into the Docker image. Set it to the amount of RAM that should be devoted to the Elasticsearch heap. Ideally for data only nodes, this value will be set to ½ the RAM, up to 30g, in which the Elasticsearch node container runs.

The path variables all map to directories that are created in the source Docker container obtained from pires/docker-elasticsearch.

Create Elasticsearch Kubernetes Services

Kubernetes Services and Deployment Files

Now we need to modify the pires/kubernetes-elasticsearch-cluster service files to use the Elasticsearch container image created earlier and to size the Elasticsearch cluster. It’s a good idea to keep the Kubernetes online docs    handy to get detailed descriptions of Kubernetes service and deployment files.

Front End Service

The front end cluster Kubernetes service file is es_svc.yaml.  It sets up a load balancer on TCP port 9200 that distributes network traffic to the client nodes.
 

 

[Line 5] The namespace is arbitrarily set to es-cluster. You can set this field to some other name according to your needs. Note you must create a namespace with this name before creating the Elasticsearch service and deployments.

[Line 7-12] The component name is arbitrarily set to elasticsearch. You can set this field to some other name according to your needs.

[Lines 15-16] The service port and protocol define the TCP port on which the service will listen for connections.  In this case they are set to the traditional Elasticsearch values 9200 and TCP respectively.

[Line 17] Creates a LoadBalancer service, thereby exposing your cluster on the public Internet, which is what we need for testing in this article. You may not want to do this if your cluster is going to operate with some other service, like a web service, in front of it.

Client Nodes

The Kubernetes service file for Elasticsearch client nodes is es_client.yaml.

 

Note: The following settings are the same for client, data, and master nodes, so they will be mentioned here only.

[Line 5] The namespace is arbitrarily set to es-cluster. You can set this field to some other name according to your needs, but it must match the name set in es_svc.yaml.

[Lines 7-13] The component names in the metadata and spec sections should match the component names in the es-service.yaml file.

[Lines 18-22] The securityContext must be set to privileged and the IPC_LOCK capability enabled to allow Elasticsearch to lock the heap in memory so it won’t be swapped. This is true for the data and master nodes.

[Line 23] Substituted the Elasticsearch image path created in the first section.

[Lines 26-29] This section will get the namespace tag from the field of the same name under metadata.

[Lines 30-31] The cluster name is arbitrarily set to my_es_cluster. You can set it to whatever name you find appropriate, but the name chosen must be the same across all deployment file.

Note: These settings are specific to each kind of node.

[Lines 32-35] To create an Elasticsearch client node set the NODE_MASTER to false and the NODE_DATA to false.

[Lines 37-38] The heap size is set to 512 MB which means the size of the client container should be 1 GB. These values are too small for most clusters, so you will have to increase them as necessary. 

[Lines 41-46] Two ports are exposed for clients. TCP port 9200 is used for RESTful requests and responses with the front service. TCP port 9300 is used for inter-node network traffic to handle Elasticsearch internal command communication and is not exposed as a service. The data and master nodes will only use this port and not 9200.

[Line 49] Set the pod memory size to twice the ES_HEAP_SIZE. 

 

So Far So Good

We are off to a good start. I have shown you how to build a Kubernetes ready Elasticsearch Docker image, created the Elasticsearch front-end service files, and took a look at the deployment file for the Elasticsearch client nodes. In part 2 of this series I will finish the job of building out an Elasticsearch cluster on Kubernetes.

Author: Vic Hargrave

About Solinea

Solinea services help enterprises build step-by-step modernization plans to evolve from legacy infrastructure and processes to modern cloud and open source infrastructure driven by DevOps and Agile processes.

Better processes and tools equals better customer (and employee) satisfaction, lower IT costs, and easier recruiting, with fewer legacy headaches.

Solinea specializes in 3 areas: 

  • Containers and Microservices –  Now enterprises are looking for ways to drive even more efficiencies, we help organizations with Docker and Kubernetes implementations – containerizing applications and orchestrating the containers in production.
  • DevOps and CI/CD Automation –  Once we build the infrastructure, the challenge is to gain agility from the environment, which is the primary reason people adopt cloud. We work at the process level and tool chain level, meaning that we have engineers that specialize in technologies like Jenkins, Git, Artifactory, Cliqr and we build these toolchains and underlying processes so organizations can build and move apps more effectively to the cloud.
  • Cloud Architecture and Infrastructure –  We are design and implementation experts, working with a variety of open source and proprietary, and have built numerous private, public, and hybrid cloud platforms for globally-recognized enterprises for over three years.