Setup Cortex (Grafana Labs) on GKE

Seubpong Monsar
7 min readMar 13, 2022

Cortex is the tool that we can store Prometheus metrics from many sources in a single place. In this article we will use Google Cloud Storage (GCS) as the storage of Cortex and will deploy it on Google Kubernetes Engine (GKE).

Introduction

Last week I got the requirements from the customer that he wants to have a single place to store all the metrics like CPU, memory, disk space, etc., from many Google Compute Engine (GCE) instances. He also wants to use those stored metrics to create some dashboards that he can view all the instance’s metrics in a single place.

After taking some analysis, these are the tech stacks that we can use to fulfill the customer need, the tech stacks I will use are.

  • Google Kubernetes Engine (GKE) — Kubernetes cluster on Google Cloud Platform that we will deploy the workloads.
  • Google Cloud Storage (GCS) — The actual storage that we will configure Cortex to use it as the storage.
  • Cortex — Data source to store Prometheus metrics.
  • Grafana — The web user interface to create dashboards.
  • Prometheus — The metrics collector, we will install this on GCE instances and feed the metrics to Cortex endpoint.
  • Node Exporter — The Prometheus exporter for hardware and OS metrics exposed by Linux/Unix kernels

Source Code

The source codes relate to this article are kept publicly in the GitHub repository here — https://github.com/its-knowledge-sharing/setup-cortex-gke. I would recommend to clone the code and read them at the same time you read this article.

Creating the GKE cluster

I’m assuming that we are familiar to the Google Cloud Platform (GCP) and already have the GCP account.

gcloud container clusters create gke-cortex --zone us-central1-a

Run the gcloud command above and wait for a while until the GKE cluster is created. Also, use command below to create the KUBECONFIG file to authenticate to our GKE cluster.

gcloud container clusters get-credentials gke-cortex --zone us-central1-a

Creating the GCS — Cloud storage for Cortex

GCS is the object storage of GCP, it is comparable to the S3 of Amazon Web Services (AWS). The reason why we use GCS to store the metrics from Prometheus instead of using traditional file system because the maintenance operation is easier (diskspace expansion). The cost of GCS is also cheaper than the traditional file system when compare to the same size.

To create the GCS bucket, run the command below. In this case I keep the environment variables I need in the .env file and export it in my own bash script. Please note that we create the GCS bucket in the same region as the GKE cluster, which is us-central1 in this case.

gsutil mb -b on -l us-central1 gs://${BUCKET_NAME}/

Creating service account for Cortex to access GCS

In order to allow Cortex able to write/read data from GCS bucket, we will need to create the IAM service account and will configure Cortex to use it at runtime.

Run the command below to create the IAM service account (SA).

gcloud iam service-accounts create ${SA_NAME} --display-name="Service account for Cortex"

Once the SA is created, we will need to assign the role what this SA can do. For the sake of simplicity, we will grant it with the role storage.objectAdmin. Please keep in mind that in the real life we should limit the access to only read/write and specific to the bucket.

gcloud projects add-iam-policy-binding ${PROJECT} --member="serviceAccount:${SA_NAME}@${PROJECT}.iam.gserviceaccount.com" --role="roles/storage.objectAdmin"

Deploying Cortex to GKE

The easiest way we deploy the application (Cortex in this case) into Kubernetes is by using Helm. Cortex also provided it’s own Helm chart here https://github.com/cortexproject/cortex-helm-chart. What we need to do is to create the Helm values file to customize what we need and run some Helm commands to get everything.

To simplify things, I wrote a Cortex deployment script which can be cloned from this GitHub repository — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/04-deploy-cortex.bash.

The code snippet below excerpted from from the script mentioned above. We need to download service account key file (JSON format) first. Then we will create the Kubernetes Secret resource from file we downloaded.

# Create service account secretgcloud iam service-accounts keys create ${KEY_FILE} --iam-account=${SA}kubectl delete secret ${SECRET} -n ${NS}kubectl create secret generic ${SECRET} --from-file=gcp-sa-file=${KEY_FILE} -n ${NS}

Once the secret is created, we should be able to see the secret gcp-sa by running “kubectl” command shown below. Later this secret will be mounted into Cortex pods which will be used for authentication to GCP.

$ kubectl get secret -n cortex gcp-saNAME TYPE DATA AGE
gcp-sa Opaque 1 20m

Now it’s the time to deploy Cortex into “cortex” namespace by using Helm as the template engine. The Kubernetes manifest file is written into temp file “tmp-cortex.yaml” which will be later used by “kubectl”.

helm repo add cortex-helm https://cortexproject.github.io/cortex-helm-charthelm template cortex cortex-helm/cortex \
-f cortex/cortex.yaml \
--set config.blocks_storage.gcs.bucket_name=${BUCKET_NAME} \
--set config.ruler_storage.gcs.bucket_name=${BUCKET_NAME} \
--set config.alertmanager_storage.gcs.bucket_name=${BUCKET_NAME} \
--namespace ${NS} > tmp-cortex.yaml
kubectl create ns ${NS}
kubectl apply -n ${NS} -f tmp-cortex.yaml

Please see Cortex Helm values file here — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/cortex/cortex.yaml for better understanding.

If everything is OK, we should be able to see the pods similarly shown as in the picture below.

kubectl get pods -n cortex
kubectl get svc -n cortex

In order to make Prometheus from outside the GKE cluster able to send the metrics to Cortex, we will need to create Kubernetes ingress resource. The Kubernetes ingress resource on GKE will be internally used to create GCP HTTP Load balance.

The correspond YAML to create the ingress is — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/cortex/cortex-ing.yaml. Please note that the ingress routes the traffic to Cortex via “cortex-nginx” ClusterIP service.

kubectl apply -n ${NS} -f cortex/cortex-ing.yaml

Once the ingress is created, we will see the ingress resource as shown in the picture below. Later we will use the IP address in the “ADDRESS” column when we setup the Prometheus.

kubectl get ing -n cortex

Deploying Grafana to GKE

The Cortex is just the data source that store the data, The easiest way to view those data is by using Grafana. Similar to the Cortex we deployed earlier, we will need to deploy the Grafana by using Helm too.

The script to deploy Grafana and it’s ingress can be found here — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/05-deploy-grafana.bash.

helm repo add grafana-helm https://grafana.github.io/helm-chartshelm template grafana grafana-helm/grafana \
-f grafana/grafana.yaml \
--skip-tests \
--namespace ${NS} > tmp-grafana-cortex.yaml
kubectl apply -n ${NS} -f tmp-grafana-cortex.yaml

The Helm values file to configure the Grafana also found here — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/grafana/grafana.yaml. if everything is OK, we should be able to see the ingress and Grafana as shown below.

kubectl get ing -n grafana-cortex

Please note that we should see the “Cortex” data source in the “Explorer” menu

Use this command to get the default Grafana default password. Keep in mind that the password is changed every time if pod is restarted.

kubectl get secret grafana-cortex \
-n grafana-cortex \
-o jsonpath="{.data.admin-password}" | base64 --decode

Create GCE instance

Now this is the time to create the GCE instance and deploy Node Exporter and Prometheus.

source .envgcloud compute instances create prometheus-001 \
--image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20220308 \
--image-project=${PROJECT} \
--machine-type=projects/its-artifact-commons/zones/us-central1-a/machineTypes/e2-medium \
--zone=us-central1-a

Once the GCE is running, we then need to install the docker and docker-compose. We will run Prometheus and Node Exporter via docker-compose. Doing this will make our life a lot easier. Please use the code below to install the required components.

#!/bin/bashcurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io
sudo systemctl enable docker
sudo curl -L "https://github.com/docker/compose/releases/download/1.25.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-composesudo chmod +x /usr/local/bin/docker-compose

Start docker-compose by using docker-compose.yaml here — https://github.com/its-knowledge-sharing/setup-cortex-gke/blob/main/docker-compose.yaml.

We also created another wrapper script here to start docker-compose shown in the snippet below.

DATA_DIR=$(pwd)CORTEX_DOMAIN=<Cortex Ingress IP>  #Change thisINSTANCE=$(hostname)
ENV_FILE=.env
PROMETHEUS_CFG=${DATA_DIR}/prometheus-config/prometheus.yaml
sudo cat << EOF > ${ENV_FILE}
DATA_DIR=${DATA_DIR}
INSTANCE=${INSTANCE}
EOF
sed -i "s#__CORTEX_DOMAIN__#${CORTEX_DOMAIN}#g" ${PROMETHEUS_CFG}
sudo mkdir -p ${DATA_DIR}/prometheus
sudo docker-compose up -d --remove-orphans

Once docker-compose is started, the Prometheus metrics should be automatically sent to Cortex that’s running on GKE.

Verifying the result

We should see the metrics in the Grafana “Explore” menu. Try using metric “node_time_second” to see if we can see it or not.

--

--