Deploy Redis Cluster using OpenShift Operator

Deploy Redis Cluster using OpenShift Operator

Introduction

Relational databases, such as MySQL and PostgreSQL, are utilized to store diverse data types for web applications or analytical studies. You might be familiar with Redis, an in-memory key-to-value database for cache and message broker. Unless your applications have specific needs for fast data storage and retrieval, such as caching, chances are you will never have to install Redis. Otherwise, an application will put Redis as optional for optimal performance if under heavy usage. My work gave me the chance to set up a Redis cluster on OpenShift, so I’ll be sharing what I’ve learned.

Redis Architecture

Redis can be used in three ways: standalone, replication, and sentinel.

  • Standalone – Single database or pod acting as master and there is no health-check, HA, or replicas/salves. If the Master is unavailable, your application will be impacted. Do read and write to Master
  • Replication means creating two databases or pods with one master and one slave. The program reads and writes to the Master, but only reads on the Slaves. Data is automatically async synced down from Master to slaves in one direction.
  • Sentinel – three databases, with a master and two slaves, try to think of it as an upgrade over replication. Sentinel is a self-healing monitoring agent normally included when you select this architecture, and it will promote one of the two slaves to Master if the Master is down.

The Redis cluster is an entirely different beast. Redis cluster architecture is the best option for production and heavy utilization of Redis. It is made up of 3 replication pairs, master and slave, totalling to 6 pods in OpenShift. It works in such a way that follower-1 will sync with master-1 and so on.

  1. redis-cluster-leader-0 → async → redis-cluster-follower-0
  2. redis-cluster-leader-1 → async → redis-cluster-follower-1
  3. redis-cluster-leader-2 → async → redis-cluster-follower-2

Below is a sample pod log showing that replica-0 (redis-cluster-follower-0) has successfully synchronized with master-0 (redis-cluster-leader-0)

1:S 11 Dec 2024 04:27:49.312 * Connecting to MASTER 181.xx.xx.244:6379
1:S 11 Dec 2024 04:27:49.312 * MASTER <-> REPLICA sync started
1:S 11 Dec 2024 04:27:49.312 # Cluster state changed: ok
1:S 11 Dec 2024 04:27:49.314 * Trying a partial resynchronization (request d326xxxxxxxx:1).
1:S 11 Dec 2024 04:27:54.140 * Full resync from master: cca0xxxxxxxx:0
1:S 11 Dec 2024 04:27:54.141 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 11 Dec 2024 04:27:54.141 * MASTER <-> REPLICA sync: Finished with success

Redis sharding is a method used to split the keyspace into 16,384 hash slots for key distribution across nodes covered by the 3 masters:

  • slots:[0-5460] (5461 slots) master-0
  • slots:[5461-10922] (5462 slots) master-1
  • slots:[10923-16383] (5461 slots) master-2

Use redis-cli --cluster check redis-cluster-leader.<namespace>.svc.cluster.local:6379 -a password to ensure all 16,384 slots are covered.

$ redis-cli --cluster check redis-cluster-leader.redis-prod.svc.cluster.local:6379 -a password
>>> Performing Cluster Check (using node redis-cluster-leader.redis-prod.svc.cluster.local:6379)
M: e19ccxxxxxxxx redis-cluster-leader.redis-prod.svc.cluster.local:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 54f1xxxxxxxx 181.18.27.193:6379
   slots: (0 slots) slave
   replicates 957dxxxxxxxx
M: 487bxxxxxxxx 181.18.33.195:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 957dxxxxxxxx 181.18.29.227:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 157axxxxxxxx 181.18.45.139:6379
   slots: (0 slots) slave
   replicates 487bxxxxxxxx
S: 3ce1xxxxxxxx 181.18.47.237:6379
   slots: (0 slots) slave
   replicates e19cxxxxxxxx
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Install Redis Cluster

There exist two methods of installing Redis cluster, namely by utilizing Helm charts to deploy Redis Cluster from Bitnami and Redis Operator from OpenShift OperatorHub.

Helm charts to deploy Redis Cluster from Bitnami

https://github.com/bitnami/charts/tree/main/bitnami/redis-cluster

The helm chart method provides additional customization at your fingertips. All the configs can be modified at the values.yaml file, for example networkPolicy, service, podSecurityContext, tls and persistence etc. I actually like this method, but I got a stubborn SSL routines::wrong version number that made me try Redis Operator from OperatorHub instead. There is online documentation (README.md) for helm chart setup, as well as tutorials on how to deploy Redis Cluster on Kubernetes using helm chart. You simply need to replace kubectl with oc for the OpenShift environment. Since there is already a helm chart tutorial, I’ll talk about Redis Operator instead.

Redis Operator from OperatorHub

This is broken down into 2 phrases. First we install the Redis Operator and prepare the necessary cluster role binding and secrets. Second, we will create a Redis cluster from the Operator with some adjustable options within a form.

Redis Operator from OpenShift OperatorHub

Phrase 1:

  1. Install Redis Operator from OpsTree Solutions from OperatorHub (Select stable version, manual approval, and single namespace)
  2. Because ClusterRoleBinding can only be applied to a single namespace, update 1x kind: ClusterRoleBinding section per environment in cluster-role-binding-redisOperator.yaml
  3. oc apply -f cluster-role-redisOperator-v2.yaml | oc apply -f cluster-role-binding-redisOperator.yaml
  4. Create key/value secret (name: redis-secret, key: password)

Phrase 2:

  1. Open Redis Operator and create RedisCluster (ref: https://github.com/OT-CONTAINER-KIT/helm-charts/blob/main/charts/redis-cluster/values.yaml)
  2. Update image from v7.0.5 to v7.2.6 (latest as of 5-Nov-24 from https://quay.io/repository/opstree/redis?tab=tags)
  3. redisSecret key and name from Phrase 1, Step 4
  4. serviceType=ClusterIP (NodePort if you need)
  5. clusterSize=3
  6. Uncheck persistenceEnabled (unless you need because Redis is an in-memory data store, )
  7. serviceAccountName=redis-operator (Upgraded privilege from Step 3)
  8. redisLeader.replicas=3
  9. redisLeader.pdb.enabled=true, redisLeader.pdb.maxUnavailable=(blank), redisLeader.pdb.minAvailable=1 (minAvailable and maxUnavailable cannot be both set)
  10. runAsGroup=1000
  11. runAsNonRoot=true
  12. redisFollower.replicas=3
  13. redisFollower.pdb.enabled=true, redisFollower.pdb.maxUnavailable=(blank), redisFollower.pdb.minAvailable=1
  14. storage.volumeClaimTemplate.spec.storageClassName=ocs-storagecluster-ceph-rbd (depends on your storage class specification)
  15. Update RedisExporter.image=quay.io/opstree/redis-exporter:v1.48.0 (latest as of 5-Nov-24)
  16. Manual update any resources limit and requests.
Fill form to create a Redis cluster from operator by OpsTree Solution.

RedisInsight

Having a graphical user interface is advantageous for evaluating various types of keys that can be legally generated in Redis. The database administrator can debug and troubleshoot if the Redis database has issues setting or getting key value pairs. While a developer does not normally have access to the database, it will be helpful for him/her to use the GUI to confirm that his or her codes are using Redis to cache data, as long as it can consume the keys correctly.

Connect to multiple Redis databases and add different Key types using Redis Insight GUI.

Redis Client as Debugging Workstation

Having a pod acting as a Redis client connected to the Redis cluster is for debugging and troubleshooting. Many redis-cli commands can be used to get information about the cluster state and the health of the nodes. The client can be used to test SET and GET key/value pairs. Although any one of the six pods can act as a client, it will be troublesome when certain pods need to be forgotten by the cluster, deleted and recreate to join back cluster when a cluster degrades. Below are the codes that were used to create a Docker image for the client and deploy it in the Redis namespace in OpenShift.

# ChatGPT
apiVersion: v1
kind: Pod
metadata:
  name: redis-client
  namespace: redis-prod  # Change to your Redis cluster namespace
spec:
  securityContext:
    runAsNonRoot: true   # Ensure the pod runs as a non-root user
    runAsUser: 1000      # Assign a non-root user, 1000 is commonly used
    seccompProfile:
      type: RuntimeDefault  # Use the RuntimeDefault seccomp profile
  containers:
    - name: redis-client
      image: redis:6.2.6  # Redis client image or any Redis CLI compatible image
      command: [ "sleep", "infinity" ]  # Keep the container running for interactive use
      securityContext:
        allowPrivilegeEscalation: false  # Disable privilege escalation
        capabilities:
          drop:
            - "ALL"  # Drop all Linux capabilities
        runAsNonRoot: true  # Ensure the container runs as a non-root user
        runAsUser: 1000     # Use a non-root user with UID 1000
        seccompProfile:
          type: RuntimeDefault  # Use the RuntimeDefault seccomp profile
      resources:
        limits:
          memory: "128Mi"
          cpu: "500m"
      imagePullPolicy: IfNotPresent
  restartPolicy: Always

Prometheus Redis Exporter for Cluster Metrics

After successfully establishing a Redis cluster in OpenShift and enabling the Redis exporter for Redis-related metrics, we may require the addition of a kind: ServiceMonitor to include the Redis cluster as a monitoring target in OpenShift Prometheus. To accomplish this, we implement the servicemoniotor.yaml file provided below.

# ChatGPT
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: redis-servicemonitor
  namespace: openshift-user-monitoring  # Use the namespace where your monitoring stack is deployed
labels:
  app.kubernetes.io/instance: k8s  # This matches your Prometheus instance label
spec:
  selector:
    matchLabels:
      openshift.io/cluster-monitoring: "true"  # Match the common label to monitor Redis services
  namespaceSelector:
    matchNames:
      - redis-dev  # Namespace where your Redis cluster services are deployed
      - redis-prod  # Second namespace
  endpoints:
    - port: redis-exporter  # This should match the port name in the Redis services (e.g., 9121)
      interval: 15s  # Scrape interval
  targetLabels:
    - app

The 6 Redis cluster pods IP address should be listed in Observe > Targets under Endpoint along with Status, Last Scape, Scape Duration etc. After that, we can navigate to Observe > Metrics and use the Prometheus Querying Language (PromQL) to view Redis-related metrics.

redis_uptime_in_seconds
redis_memory_used_bytes
sum(redis_db_keys)

Troubleshoot Redis

During a recent OpenShift outage, the Redis cluster in the production namespace recovered as per normal, but the cluster in the development namespace was degraded. In general, degraded Redis cluster nodes will not be capable of covering all 16,384 slots, indicating that the nodes are operating in isolation and key/value pairs can still be written to them. The redis-cli --cluster fix and redis-cli --cluster rebalance commands can be used to fix the Redis cluster. In more serious cases, it is still possible to resolve such issues by deleting the old cluster node details from the nodes.conf file to ensure only healthy nodes are added to redeploy the pods to rebuild the Redis cluster. Since there is no one solution for every situation, I will list a few useful steps I took to fix my recent cluster outrage.

# Use redis-client terminal to connect to Redis cluster
$ redis-cli -c -h redis-cluster-leader.redis-dev.svc.cluster.local -p 6379 -a password

# These redis-cli commands check for Redis cluster health
> INFO REPLICATION
> CLUSTER NODES
> CLUSTER SLOTS

# Check if all 16,384 slots are covered by cluster
# [ERR] if degraded
$ redis-cli --cluster check redis-cluster-leader.redis-dev.svc.cluster.local:6379 -a password
...
>>> Check for open slots... 
>>> Check slots coverage... 
[ERR] Not all 16384 slots are covered by nodes.

# Try these 3 commands for a Moderate severity case
$ redis-cli --cluster check redis-cluster-leader.redis-dev.svc.cluster.local:6379 -a password
$ redis-cli --cluster fix redis-cluster-leader.redis-dev.svc.cluster.local:6379 -a password
$ redis-cli --cluster rebalance redis-cluster-leader.redis-dev.svc.cluster.local:6379 -a password

# Critical severity case when above 'fix' and 'rebalance' do not execute successfully
# From any connected master node, remove failed nodes from > CLUSTER SLOTS
$ redis-cli -h 181.xx.xx.217 -p 6379 cluster forget <node-id>

# Re-add healthy nodes by removing nodes.conf
# nodes.conf stores (degraded) master/slave node-id and slot range
$ oc exec redis-cluster-leader-0 -- rm /node-conf/nodes.conf
$ oc exec redis-cluster-leader-1 -- rm /node-conf/nodes.conf
$ oc exec redis-cluster-leader-2 -- rm /node-conf/nodes.conf
$ oc exec redis-cluster-follower-0 -- rm /node-conf/nodes.conf
$ oc exec redis-cluster-follower-1 -- rm /node-conf/nodes.conf
$ oc exec redis-cluster-follower-2 -- rm /node-conf/nodes.conf

# From OpenShift, delete pods from StatefulSets to reconstruct Redis cluster
# Then, check leader and follower pods' log for cluster state

#  Re-check the health and configuration of the Redis cluster
$ redis-cli --cluster check redis-cluster-leader.redis-dev.svc.cluster.local:6379 -a password
...
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Conclusion

You can choose a Redis cluster for heavy use or a replication/ sentinel architecture for light to moderate use. If you installed Redis cluster the Bitnami way, you have more configurations in values.yaml, but do go for OpenShift OperatorHub for quick convenient deployment using Redis operator. You can set up one of the free Redis GUI tools, like Redis Commander, to manage your Redis databases visually. It is recommended to use Prometheus monitoring by enabling the Redis exporter. This will help you keep track of metrics collected from redis_cluster_state, redis_master_link_up, and redis_memory_used_bytes. Go explore the different redis-cli commands available to check and remedy a degraded cluster, but this requires a better understanding of the Redis deployment.

The last post for the year 2024 concludes with me wishing you a Merry Christmas and a New Year filled with great vibes, good people, and good times.

TechSch.com webmaster
Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *