Rancher Kubernetes Engine (RKE) on GCP using Terraform and Terragrunt

7 min readApr 15, 2022

This article is about to demonstrate how we use Terraform and Terragrunt to provision the Rancher Kubernetes Engine (RKE) cluster on the GCP infrastructure. The purpose of using RKE on GCP is not about to replace the Google Kubernetes Engine (GKE) but instead to study how RKE works, and this can be used for learning the Kubernetes too.

I’m personally a fan of RKE who always use it for all On-Premise projects that require Kubernetes, there are many reasons that I won’t show here. In the beginning when I studied the RKE, I didn’t actually have the On-Premise environment to install it, so I used the GCE instances on GCP and install RKE there instead. Moreover, I used Terraform and Terragrunt to provision the GCE instances and the RKE cluster on top. Yes, Terraform and Terragrunt and RKE help me a lot to simplify the tasks for provisioning the Kubernetes clusters.

In this article we will also see how we structure the Terraform codes into several folders based on the functionalities, and use Terragrunt to provision them all just by using few commands.

This article use RKE Terraform provider here to create the RKE cluster. At this time the RKE2 Terraform provider is not available yet, I’m still waiting for it.

Source Code

The source codes relate to this article are kept publicly in the GitHub repository here — https://github.com/its-knowledge-sharing/gcp-rke-terraform. I would recommend to clone the codes and read them at the same time you read this article.

In this article we also create a GCE instance that will be use as a workstation to run the commands in order to provision the RKE cluster. For the sake of simplicity, I will use the existing one from the previous article here — GCP Infrastructure as Code with Terraform and Terragrunt. I strongly recommend to read it first before stepping further.

Assumptions

We assume that you already have a GCE instance that will be used as a workstation. The easiest way to have it is by following the previous article I wrote — GCP Infrastructure as Code with Terraform and Terragrunt and we will get one as shown in the picture below.

Later we will SSH to this rke-manager-00 instance and clone the terraform codes from https://github.com/its-knowledge-sharing/gcp-rke-terraform and then perform the few steps in order to create the RKE cluster.

How to run the code?

To SSH to the workstation instance we can do by clicking the SSH button as shown in the picture below.

Once clicking the button, there will be a browser-based terminal shows up as shown in the picture below, and this is the place that we will run our commands.

Clone the code from https://github.com/its-knowledge-sharing/gcp-rke-terraform.git, the Terraform codes will be in the gcp-rke-terraform folder.

Note that we need to authenticate to GCP first by running 2 commands below in our browser-based terminal.

gcloud auth login
gcloud auth application-default login

Once we authenticate to GCP, we’re now ready to run commands below to provision GCE instances and deploy RKE cluster on those GCEs. The “init” operation is needed only for the first time.

./rke-cluster.bash init
./rke-cluster.bash apply

If everything is OK, after waiting the “apply” operation for a while, we then see the GCE instances created and the RKE cluster is ready to use.

Please note that we use the customized GCE OS image here, you will need to create your own in your GCP project. The required components are installed by using this script setup.bash.

We will also see the kubeconfig file ready in the current directory where we run the rke-cluster.bash script as shown below. Use the command below to export KUBECONFIG environment variable and then we will be ready to use kubectl command.

export KUBECONFIG=$(pwd)/kubeconfig

Wrapper script

For the simplicities, I created a wrapper script rke-cluster.bash that will internally creates the public/private key files (needed for provisioning the RKE cluster), invoking the Terragrunt command and finally export the kubeconfig file from Terraform variable.

Files Structure

00–1-sa, the code to create service account and role for the GCE instances.
00–2-firewall, the code to configure the firewall rules for the workstation GCE and the master nodes + worker nodes to make them connectable. In real life, we may limit the ports to only what we actually need.
01–1-gce-nodes, the code to create the GCE instances that use the custom Ubuntu as OS image. Later these GCEs will be used to perform the master and worker nodes role in the RKE cluster.
02–1-rke-cluster, the code to create the RKE cluster from the GCE instances created by 01–1-gce-nodes.
modules/gce, the internal module for creating GCE instance.
terragrunt.hcl, this is the shared configuration file uses for configuring Terraform codes in the folders mentioned above.

Please note that to run the example code in you environment, we will need to change the project name in terragrunt.hcl to match yours.

Program interface

The idea of the configuration is to put the configurable items in a single file here terragrunt.hcl. Later we will demonstrate the simple operations like adding new worker nodes and removing the worker nodes from the RKE cluster just by modifying terragrunt.hcl.

The snippet below excerpted from terragrunt.hcl to demonstrate how we represent the master and worker nodes in the array of objects. Each object in the array represents the GCE instance that will be used to form the RKE cluster. Each object has the profile attribute that maps to the another profiles object down below in the snippet.

Remove a worker node

Assume that we want to remove a worker node rke-worker-05 from the cluster. We can do this by modifying an object in the worker_nodes array in the terragrunt.hcl.

Modify the mode attribute from “registered” to “unregistered” will instruct Terraform to only remove node from RKE cluster without actually delete the GCE instance.

After we modify the terragrunt.hcl, now run the command below to instruct Terragrunt to apply the change.

./rke-cluster.bash apply

Wait for a while until we see the output similar to the below picture.

Now there is no rke-worker-05 node in the cluster, but the GCE instance is still there.

To actually remove the unused GCE instance, we can do this by removing the object below from the terragrunt.hcl and then run the apply operation again.

./rke-cluster.bash apply

We should unregister the node from cluster first before actually destroy the GCE instances.
In real life, before unregister the node from the cluster, we should “cordon” that node first and drain all the pods to the others.

Add a worker node

Assume that we want to add a worker node rke-worker-05 to the cluster (we previously removed it). We can do this by just adding the object below back, or change from “unregistered” to “registered” if it already exists. Yes, we need to call the command below for the changes to be applied.

./rke-cluster.bash apply

To create a brand new GCE instance and automatically register a node to RKE cluster, we can do this by adding an object with mode “registered” to the worker_nodes array.
We may see the error if the created GCE instance is not actually ready to join the RKE cluster, feel free to re-run the “./rke-cluster.bash apply” again and this will solve the issue.
We can also perform 2 steps separately by first adding the object with mode “unregistered” to create the GCE instance . The 2nd step is by changing the “unregistered” to “registered” in order to add the created GCE instance to RKE cluster.

Cleanup

If everything is done and we want remove all we created earlier, we can simply do it by just by calling “./rke-cluster.bash destroy” command.

Supports

Congratulation!!! if you’ve read the entire article and it is able to help you solve your issues. You can support me by:

Follow me.
Share my articles.
Buy me a coffee via ADA address below if you want.