Setup Cortex (Grafana Labs) on GKE


Last week I got the requirements from the customer that he wants to have a single place to store all the metrics like CPU, memory, disk space, etc., from many Google Compute Engine (GCE) instances. He also wants to use those stored metrics to create some dashboards that he can view all the instance’s metrics in a single place.

  • Google Kubernetes Engine (GKE) — Kubernetes cluster on Google Cloud Platform that we will deploy the workloads.
  • Google Cloud Storage (GCS) — The actual storage that we will configure Cortex to use it as the storage.
  • Cortex — Data source to store Prometheus metrics.
  • Grafana — The web user interface to create dashboards.
  • Prometheus — The metrics collector, we will install this on GCE instances and feed the metrics to Cortex endpoint.
  • Node Exporter — The Prometheus exporter for hardware and OS metrics exposed by Linux/Unix kernels

Source Code

The source codes relate to this article are kept publicly in the GitHub repository here — I would recommend to clone the code and read them at the same time you read this article.

Creating the GKE cluster

I’m assuming that we are familiar to the Google Cloud Platform (GCP) and already have the GCP account.

gcloud container clusters create gke-cortex --zone us-central1-a
gcloud container clusters get-credentials gke-cortex --zone us-central1-a

Creating the GCS — Cloud storage for Cortex

GCS is the object storage of GCP, it is comparable to the S3 of Amazon Web Services (AWS). The reason why we use GCS to store the metrics from Prometheus instead of using traditional file system because the maintenance operation is easier (diskspace expansion). The cost of GCS is also cheaper than the traditional file system when compare to the same size.

gsutil mb -b on -l us-central1 gs://${BUCKET_NAME}/

Creating service account for Cortex to access GCS

In order to allow Cortex able to write/read data from GCS bucket, we will need to create the IAM service account and will configure Cortex to use it at runtime.

gcloud iam service-accounts create ${SA_NAME} --display-name="Service account for Cortex"
gcloud projects add-iam-policy-binding ${PROJECT} --member="serviceAccount:${SA_NAME}@${PROJECT}" --role="roles/storage.objectAdmin"

Deploying Cortex to GKE

The easiest way we deploy the application (Cortex in this case) into Kubernetes is by using Helm. Cortex also provided it’s own Helm chart here What we need to do is to create the Helm values file to customize what we need and run some Helm commands to get everything.

# Create service account secretgcloud iam service-accounts keys create ${KEY_FILE} --iam-account=${SA}kubectl delete secret ${SECRET} -n ${NS}kubectl create secret generic ${SECRET} --from-file=gcp-sa-file=${KEY_FILE} -n ${NS}
$ kubectl get secret -n cortex gcp-saNAME TYPE DATA AGE
gcp-sa Opaque 1 20m
helm repo add cortex-helm template cortex cortex-helm/cortex \
-f cortex/cortex.yaml \
--set config.blocks_storage.gcs.bucket_name=${BUCKET_NAME} \
--set config.ruler_storage.gcs.bucket_name=${BUCKET_NAME} \
--set config.alertmanager_storage.gcs.bucket_name=${BUCKET_NAME} \
--namespace ${NS} > tmp-cortex.yaml
kubectl create ns ${NS}
kubectl apply -n ${NS} -f tmp-cortex.yaml
kubectl get pods -n cortex
kubectl get svc -n cortex
kubectl apply -n ${NS} -f cortex/cortex-ing.yaml
kubectl get ing -n cortex

Deploying Grafana to GKE

The Cortex is just the data source that store the data, The easiest way to view those data is by using Grafana. Similar to the Cortex we deployed earlier, we will need to deploy the Grafana by using Helm too.

helm repo add grafana-helm template grafana grafana-helm/grafana \
-f grafana/grafana.yaml \
--skip-tests \
--namespace ${NS} > tmp-grafana-cortex.yaml
kubectl apply -n ${NS} -f tmp-grafana-cortex.yaml
kubectl get ing -n grafana-cortex
kubectl get secret grafana-cortex \
-n grafana-cortex \
-o jsonpath="{.data.admin-password}" | base64 --decode

Create GCE instance

Now this is the time to create the GCE instance and deploy Node Exporter and Prometheus.

source .envgcloud compute instances create prometheus-001 \
--image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20220308 \
--image-project=${PROJECT} \
--machine-type=projects/its-artifact-commons/zones/us-central1-a/machineTypes/e2-medium \
#!/bin/bashcurl -fsSL | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable"sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli
sudo systemctl enable docker
sudo curl -L "$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-composesudo chmod +x /usr/local/bin/docker-compose
DATA_DIR=$(pwd)CORTEX_DOMAIN=<Cortex Ingress IP>  #Change thisINSTANCE=$(hostname)
sudo cat << EOF > ${ENV_FILE}
sudo mkdir -p ${DATA_DIR}/prometheus
sudo docker-compose up -d --remove-orphans

Verifying the result

We should see the metrics in the Grafana “Explore” menu. Try using metric “node_time_second” to see if we can see it or not.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store