前言 竟然被问到了redis集群了不了解,我除了主从其他还没搭建过,这篇就搭建下redis的主从模式,哨兵模式,Cluster模式。
正文 主从模式 这个模式老简单了,读写分离嘛。看我快速写出k8s编排文件。一主一从。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 apiVersion: apps/v1 kind: Deployment metadata: name: redis-master-deployment labels: app: redis-master-deployment spec: replicas: 1 selector: matchLabels: app: redis-master template: metadata: name: redis-master labels: app: redis-master spec: containers: - name: redis-master image: redis:7.0 imagePullPolicy: IfNotPresent ports: - containerPort: 6379 args: ["--requirepass" , "test" ] restartPolicy: Always --- apiVersion: v1 kind: Service metadata: name: redis-master-service-nodeport spec: selector: app: redis-master type: NodePort ports: - port: 6379 targetPort: 6379 --- apiVersion: v1 kind: Service metadata: name: redis-master-service spec: selector: app: redis-master type: ClusterIP ports: - port: 6379 targetPort: 6379 --- apiVersion: apps/v1 kind: Deployment metadata: name: redis-slave-deployment labels: app: redis-slave-deployment spec: replicas: 1 selector: matchLabels: app: redis-slave template: metadata: name: redis-slave labels: app: redis-slave spec: containers: - name: redis-slave image: redis:7.0 imagePullPolicy: IfNotPresent ports: - containerPort: 6379 args: ["--requirepass" , "test" , "--replicaof" , "redis-master-service" , "6379" , "--masterauth" , "test" ] restartPolicy: Always --- apiVersion: v1 kind: Service metadata: name: redis-slave-service spec: selector: app: redis-slave type: NodePort ports: - port: 6379 targetPort: 6379
这里能优化的是把配置挪到configMap中。再加节点也是一样的,可以把slave副本拉上去,也可以多加一个service。
哨兵模式 这个模式用k8s部署坑还是比较多的,参考了网上的一些文章 。但是使用起来也是有一些问题。
坑点:
sentinel集群外部连接比较困难,当然这个可以配合ktconnect来做。
sentinel的配置文件无法直接挂载configmap,因为启动sentinel是会检查配置文件是否可写。。。这个看了下执行日志发现会自动更改配置文件。
由2可知,不想改镜像的话得动态写入配置文件到卷中。
参照网上的教程,我也照猫画虎搞了一下。
先是配置secret,存储redis和sentinel的密码,这里密码都是同一个。
1 2 3 4 5 6 7 kind: Secret apiVersion: v1 metadata: name: redis-secret type: Opaque stringData: REDIS_PASSWORD: "test"
redis的配置文件和初始化脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 apiVersion: v1 kind: ConfigMap metadata: name: redis-config-map data: REDIS_NODES: "redis-0.redis,redis-1.redis,redis-2.redis" redis.conf: | bind 0.0.0.0 protected-mode no port 6379 tcp-backlog 511 timeout 10 tcp-keepalive 30 daemonize no supervised no pidfile "/var/run/redis_6379.pid" loglevel notice logfile "" databases 16 always-show-logo yes save "" stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes rdb-del-sync-files no dir "/data" replica-serve-stale-data yes replica-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-diskless-load disabled appendonly no repl-disable-tcp-nodelay no replica-priority 100 acllog-max-len 128 ---
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 apiVersion: v1 kind: ConfigMap metadata: name: redis-init-script-config data: sentinel_init.sh: | #! /bin/bash for i in ${REDIS_NODES//,/ } do echo "find master at $i" MASTER=$(redis-cli --no-auth-warning --raw -h $i -a ${REDIS_PASSWORD} info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2) if [ "${MASTER}" == "" ]; then echo "no master found" MASTER= else echo "found ${MASTER}" break fi done echo "sentinel resolve-hostnames yes" >> /etc/redis/sentinel.conf echo "sentinel announce-hostnames yes" >> /etc/redis/sentinel.conf echo "sentinel monitor mymaster ${MASTER} 6379 2" >> /etc/redis/sentinel.conf echo "sentinel auth-pass mymaster ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel down-after-milliseconds mymaster 5000" >> /etc/redis/sentinel.conf echo "sentinel sentinel-pass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel parallel-syncs mymaster 1" >> /etc/redis/sentinel.conf echo "sentinel failover-timeout mymaster 10000" >> /etc/redis/sentinel.conf echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel announce-ip ${HOSTNAME}.sentinel" >> /etc/redis/sentinel.conf cat /etc/redis/sentinel.conf redis_init.sh: | #! /bin/bash cp /tmp/redis/redis.conf /etc/redis/redis.conf echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/redis.conf echo "masterauth ${REDIS_PASSWORD}" >> /etc/redis/redis.conf echo "replica-announce-ip ${HOSTNAME}.redis" >> /etc/redis/redis.conf echo "replica-announce-port 6379 " >> /etc/redis/redis.conf echo "finding master..." if [ "$(timeout 5 redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping)" != "PONG" ]; then echo "sentinel not found, defaulting to redis-0" if [ $ {HOSTNAME } == "redis-0" ]; then echo "this is redis-0, not updating config..." else echo "updating redis.conf..." echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf echo "slave-read-only no" >> /etc/redis/redis.conf echo "slaveof redis-0.redis 6379" >> /etc/redis/redis.conf fi else echo "sentinel found, finding master" MASTER="$(redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} sentinel get-master-addr-by-name mymaster | grep -E '(^redis-*)|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})' )" if [ "${HOSTNAME}.redis" == $ {MASTER } ]; then echo "this is master, not updating config..." else echo "master found : ${MASTER}, updating redis.conf" echo "slave-read-only no" >> /etc/redis/redis.conf echo "slaveof ${MASTER} 6379" >> /etc/redis/redis.conf echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf fi fi ---
这里稍微解释一下这个初始化脚本。首先讲讲redis_init.sh吧,首先我们会挂载 redis-config-map到/tmp/redis目录,把配置文件拷贝到/etc/redis/redis.conf作为我们真正的配置文件。然后就开始判断当前是否有sentinel且当前节点是否为主节点了。redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping
是检查sentinel是否可用。而下面的MASTER则是判断当前节点是否是主节点。如果不是主节点,那么就添加slaveof到主节点,这里有个问题,就是slave-read-only应不应该是no。这里我尝试了一下,非master节点如果可写的话,其实同步不到主节点去。
然后说说sentinel_init.sh。先遍历所有redis节点,然后找出master节点,然后把配置文件写到/etc/redis/sentinel.conf中供后续使用。
然后是redis节点的部署。采用statefulSet部署。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 apiVersion: apps/v1 kind: StatefulSet metadata: name: redis labels: app: redis spec: serviceName: redis replicas: 3 selector: matchLabels: app: redis template: metadata: name: redis labels: app: redis spec: initContainers: - name: config image: redis:7.0 env: - name: REDIS_PASSWORD valueFrom: secretKeyRef: name: redis-secret key: REDIS_PASSWORD command: ["sh" , "-c" , "/scripts/redis_init.sh" ] volumeMounts: - name: redis-config mountPath: /etc/redis/ - name: init-scripts mountPath: /scripts/ - name: origin-config mountPath: /tmp/redis containers: - name: redis image: redis:7.0 imagePullPolicy: IfNotPresent args: ["/etc/redis/redis.conf" ] ports: - containerPort: 6379 volumeMounts: - name: redis-config mountPath: /etc/redis/ - name: data mountPath: /data volumes: - name: data emptyDir: {} - name: redis-config emptyDir: {} - name: init-scripts configMap: name: redis-init-script-config defaultMode: 0777 items: - key: redis_init.sh path: redis_init.sh - name: origin-config configMap: name: redis-config-map items: - key: redis.conf path: redis.conf restartPolicy: Always affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - redis topologyKey: kubernetes.io/hostname --- apiVersion: v1 kind: Service metadata: name: redis spec: selector: app: redis clusterIP: None ports: - port: 6379 name: redis --- apiVersion: v1 kind: Service metadata: name: redis-0-node-port spec: selector: statefulset.kubernetes.io/pod-name: redis-0 type: NodePort ports: - port: 6379 targetPort: 6379 --- apiVersion: v1 kind: Service metadata: name: redis-1-node-port spec: selector: statefulset.kubernetes.io/pod-name: redis-1 type: NodePort ports: - port: 6379 targetPort: 6379 --- apiVersion: v1 kind: Service metadata: name: redis-2-node-port spec: selector: statefulset.kubernetes.io/pod-name: redis-2 type: NodePort ports: - port: 6379 targetPort: 6379 ---
解读一下。首先在初始化容器中,我们挂载了三个目录redis-config init-scripts origin-config
。其中redis-config是一个临时卷,前面我们说过,redis和sentinel启动时要求配置文件可写,所以这里用的临时卷,init-scripts是初始化脚本,origin-config是redis的基础配置文件。init容器的作用就是将origin-config中的配置文件拷贝到redis-config中,并根据init-scripts中的redis_init.sh脚本生成配置文件到redis-config中,提供给真正容器使用。而真正容器中需要挂载的就是redis-config和data目录,data是redis的数据目录,生产环境中应配合pv和pvc实现持久化,这里采用临时卷过渡下。
随后定义几个service,没啥好说的,这里是把三个节点的redis分别用nodePort暴露出来了,每个节点都能连接。
然后是sentinel服务。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 apiVersion: apps/v1 kind: StatefulSet metadata: name: sentinel spec: serviceName: sentinel replicas: 3 selector: matchLabels: app: sentinel template: metadata: labels: app: sentinel spec: initContainers: - name: config image: redis:7.0 env: - name: REDIS_NODES valueFrom: configMapKeyRef: name: redis-config-map key: REDIS_NODES - name: REDIS_PASSWORD valueFrom: secretKeyRef: name: redis-secret key: REDIS_PASSWORD command: ["sh" , "-c" , "/scripts/sentinel_init.sh" ] volumeMounts: - name: redis-config mountPath: /etc/redis/ - name: init-script mountPath: /scripts/ containers: - image: redis:7.0 name: sentinel command: ["redis-sentinel" ] args: ["/etc/redis/sentinel.conf" ] ports: - name: sentinel containerPort: 26379 volumeMounts: - name: redis-config mountPath: /etc/redis/ - name: data mountPath: /data volumes: - name: init-script configMap: name: redis-init-script-config defaultMode: 0777 items: - key: sentinel_init.sh path: sentinel_init.sh - name: redis-config emptyDir: {} - name: data emptyDir: {} affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - sentinel topologyKey: kubernetes.io/hostname --- apiVersion: v1 kind: Service metadata: name: sentinel spec: selector: app: sentinel clusterIP: None ports: - port: 26379 name: redis --- apiVersion: v1 kind: Service metadata: name: sentinel-node-port spec: selector: app: sentinel type: NodePort ports: - port: 26379 targetPort: 26379 ---
逻辑和redis差不多,这里便不再赘述。唯一有问题的地方是,连接到sentinel后,可以知道主节点的host,这里应该是redis-0.redis,但是由于我们不再集群内,所以dns域名解析失败,外部连接的话可能得配置ktconnect。这里我测试了下,当把主节点的pod删除后,sentinel会自动选举出新节点,当然这里也有一个问题,在主节点下线后的极短时间内,sentinel依旧会认为主节点是当前下线的节点,导致下线节点重新被拉起时,是以主节点配置启动的,但是当主容器还未启动时,sentinel更改了master,那么此时重新拉起的容器配置文件会被sentinel增加以下内容,导致节点变为只读。这与原先作者的本意不同。 这里可以改进把每个节点配置文件增加slave-read-only no
,即无论是否是主节点都增加这个。
1 2 3 # Generated by CONFIG REWRITE replicaof redis-1.redis 6379 latency-tracking-info-percentiles 50 99 99.9
其次是既有主节点又有其他节点,那么通过sentinel获取时从节点时,也可进行数据添加,极大概率增加了数据的不可靠性。
因此最终我改造了一下,把redis配置文件稍微改了下,从节点就应该不允许写。其实就是将echo "slave-read-only no" >> /etc/redis/redis.conf
删除。此时只有主节点才能写,这才符合主从规范。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 apiVersion: v1 kind: ConfigMap metadata: name: redis-init-script-config data: sentinel_init.sh: | #! /bin/bash for i in ${REDIS_NODES//,/ } do echo "find master at $i" MASTER=$(redis-cli --no-auth-warning --raw -h $i -a ${REDIS_PASSWORD} info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2) if [ "${MASTER}" == "" ]; then echo "no master found" MASTER= else echo "found ${MASTER}" break fi done echo "sentinel resolve-hostnames yes" >> /etc/redis/sentinel.conf echo "sentinel announce-hostnames yes" >> /etc/redis/sentinel.conf echo "sentinel monitor mymaster ${MASTER} 6379 2" >> /etc/redis/sentinel.conf echo "sentinel auth-pass mymaster ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel down-after-milliseconds mymaster 5000" >> /etc/redis/sentinel.conf echo "sentinel sentinel-pass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel parallel-syncs mymaster 1" >> /etc/redis/sentinel.conf echo "sentinel failover-timeout mymaster 10000" >> /etc/redis/sentinel.conf echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf echo "sentinel announce-ip ${HOSTNAME}.sentinel" >> /etc/redis/sentinel.conf cat /etc/redis/sentinel.conf redis_init.sh: | #! /bin/bash cp /tmp/redis/redis.conf /etc/redis/redis.conf echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/redis.conf echo "masterauth ${REDIS_PASSWORD}" >> /etc/redis/redis.conf echo "replica-announce-ip ${HOSTNAME}.redis" >> /etc/redis/redis.conf echo "replica-announce-port 6379 " >> /etc/redis/redis.conf echo "finding master..." if [ "$(timeout 5 redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping)" != "PONG" ]; then echo "sentinel not found, defaulting to redis-0" if [ $ {HOSTNAME } == "redis-0" ]; then echo "this is redis-0, not updating config..." else echo "updating redis.conf..." echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf echo "slaveof redis-0.redis 6379" >> /etc/redis/redis.conf fi else echo "sentinel found, finding master" MASTER="$(redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} sentinel get-master-addr-by-name mymaster | grep -E '(^redis-*)|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})' )" if [ "${HOSTNAME}.redis" == $ {MASTER } ]; then echo "this is master, not updating config..." else echo "master found : ${MASTER}, updating redis.conf" echo "slaveof ${MASTER} 6379" >> /etc/redis/redis.conf echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf fi fi ---
因此最后使用方法就是先通过sentinel获取主节点或者从节点的地址,然后主节点写,从节点读。
总得来说,哨兵模式搭建在k8s中会比较复杂,因为涉及到有状态服务的定义,所以得配合init容器和bash脚本去控制容器启动逻辑。
Cluster模式 这个模式全自动好像比较困难,需要用到redis-trib工具。由于cluster集群必须在所有节点启动后才能进行初始化,而如果将初始化逻辑类似哨兵模式一样写入init.sh中,则是一件非常复杂而且低效的行为。
先配置config
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: v1 kind: ConfigMap metadata: name: redis-config-map data: redis.conf: | bind 0.0.0.0 protected-mode no port 6379 cluster-enabled yes cluster-config-file /data/redis.conf cluster-node-timeout 5000 dir "/data" ---
然后是redis节点创建出来,并创建无头服务。(无头service不分配clusterIP, 一般用于pod实例之间相互通信,不再像普通service那样负载均衡。 )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 apiVersion: apps/v1 kind: StatefulSet metadata: name: redis spec: serviceName: redis replicas: 6 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: terminationGracePeriodSeconds: 20 containers: - name: redis image: redis:7.0 args: - "/conf/redis.conf" ports: - name: redis containerPort: 6379 protocol: "TCP" - name: cluster containerPort: 16379 protocol: "TCP" volumeMounts: - name: redis-conf mountPath: /conf/ - name: redis-data mountPath: /data volumes: - name: redis-conf configMap: name: redis-config-map items: - key: redis.conf path: redis.conf - name: redis-data emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: redis spec: selector: app: redis clusterIP: None ports: - port: 6379 name: redis ---
注意这里我们采用的是临时卷,生产环境data务必换成持久化卷。
并且没有换成持久化卷是有问题的,后面会讲到,可以参照后面新的yaml。
注意这里没有配置节点亲和性,原因是因为我们创建的是3主3从,没有过多的节点让我们去均匀部署。
无头服务的statefulSet会分配dns,为<pod name>.<service name>.<namespace>.svc.cluster.local
尝试dns这些服务
1 2 3 4 5 6 7 8 kubectl run --rm -i --tty busybox --image=busybox:1.28 /bin/sh nslookup redis-0.redis Server: 10.43.0.10 Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local Name: redis-0.redis Address 1: 10.42.1.60 redis-0.redis.default.svc.cluster.local
使用一个额外的容器来初始化我们的redis集群。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 kubectl run -it ubuntu --image=ubuntu:20.04 --restart=Never /bin/bash kubectl exec -it ubuntu /bin/bash cat > /etc/apt/sources.list << EOF deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse EOF apt-get update apt-get install -y vim wget python3 python3-pip redis-tools dnsutils pip install redis-trib 创建只有Master节点的集群 redis-trib.py create \ `dig +short redis-0.redis.default.svc.cluster.local`:6379 \ `dig +short redis-1.redis.default.svc.cluster.local`:6379 \ `dig +short redis-2.redis.default.svc.cluster.local`:6379 为每个Master添加Slave redis-trib.py replicate \ --master-addr `dig +short redis-0.redis.default.svc.cluster.local`:6379 \ --slave-addr `dig +short redis-3.redis.default.svc.cluster.local`:6379 redis-trib.py replicate \ --master-addr `dig +short redis-1.redis.default.svc.cluster.local`:6379 \ --slave-addr `dig +short redis-4.redis.default.svc.cluster.local`:6379 redis-trib.py replicate \ --master-addr `dig +short redis-2.redis.default.svc.cluster.local`:6379 \ --slave-addr `dig +short redis-5.redis.default.svc.cluster.local`:6379
到这里我们的集群就初始化成功了。
我们可以进入其中一个redis节点看看。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 kubectl exec -it redis-0 /bin/bash root@redis-0:/data 127.0.0.1:6379> cluster nodes c0309e16b8d0727a4ad2cbe939ec59caac46e37d 10.42.0.93:6379@16379 slave 06809780e7800808a217eafb35f8cee395f51820 0 1721305716101 2 connected 8a63ab6b6e6b1db63855afdf604646e7f0145348 10.42.1.60:6379@16379 myself,master - 0 1721305715000 1 connected 10923-16383 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 10.42.2.60:6379@16379 master - 0 1721305714591 4 connected 0-5461 06809780e7800808a217eafb35f8cee395f51820 10.42.0.92:6379@16379 master - 0 1721305714592 2 connected 5462-10922 f8effa58385f8941a193dfadbf2e90d018ca1c19 10.42.1.59:6379@16379 slave 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 0 1721305715094 4 connected 2937328a1fb97f0c21d203d24039e3f4f4e49da3 10.42.2.61:6379@16379 slave 8a63ab6b6e6b1db63855afdf604646e7f0145348 0 1721305716504 1 connected 127.0.0.1:6379> cluster info cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:4 cluster_my_epoch:1 cluster_stats_messages_ping_sent:555 cluster_stats_messages_pong_sent:549 cluster_stats_messages_meet_sent:2 cluster_stats_messages_sent:1106 cluster_stats_messages_ping_received:549 cluster_stats_messages_pong_received:557 cluster_stats_messages_received:1106 total_cluster_links_buffer_limit_exceeded:0
创建一个nodePort服务供外部访问看看
1 2 3 4 5 6 7 8 9 10 11 12 13 14 apiVersion: v1 kind: Service metadata: name: redis-access-service labels: app: redis spec: type: NodePort ports: - name: redis-port port: 6379 targetPort: 6379 selector: app: redis
注意这里像another redis desktop manager
这样的软件不能勾选cluster,因为它识别的所有节点都是内部ip。直接正常连接就行。
这里我们直接删除redis-0,观察redis-3日志
1 2 3 4 5 6 7 8 9 1:S 18 Jul 2024 12:38:37.534 * Marking node 8a63ab6b6e6b1db63855afdf604646e7f0145348 as failing (quorum reached). 1:S 18 Jul 2024 12:38:37.534 # Cluster state changed: fail 1:S 18 Jul 2024 12:38:37.563 # Start of election delayed for 919 milliseconds (rank #0, offset 1171). 1:S 18 Jul 2024 12:38:38.569 # Starting a failover election for epoch 5. 1:S 18 Jul 2024 12:38:38.576 # Failover election won: I'm the new master. 1:S 18 Jul 2024 12:38:38.576 # configEpoch set to 5 after successful failover 1:M 18 Jul 2024 12:38:38.576 * Discarding previously cached master state. 1:M 18 Jul 2024 12:38:38.576 # Setting secondary replication ID to 793c7bf0d23ad2480d10ebddeef4de92500b7f41, valid up to offset: 1172. New replication ID is d4170f01b37dbd48ab2898ff8cd2404a7025a2b7 1:M 18 Jul 2024 12:38:38.577 # Cluster state changed: ok
可以看到此时redis-3晋升为master,进入控制台后也可以使用role命令查看。
1 2 3 4 5 6 root@redis-3:/data# redis-cli 127.0.0.1:6379> role 1) "master" 2) (integer) 1171 3) (empty array) 127.0.0.1:6379>
但此时新的问题来了,redis-0重新部署后加不到集群中了。我们在redis-3中执行cluster nodes
1 2 3 4 5 6 7 8 root@redis-3:/data# redis-cli 127.0.0.1:6379> cluster nodes 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 10.42.2.60:6379@16379 master - 0 1721306662601 4 connected 0-5461 c0309e16b8d0727a4ad2cbe939ec59caac46e37d 10.42.0.93:6379@16379 slave 06809780e7800808a217eafb35f8cee395f51820 0 1721306663005 2 connected f8effa58385f8941a193dfadbf2e90d018ca1c19 10.42.1.59:6379@16379 slave 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 0 1721306662000 4 connected 2937328a1fb97f0c21d203d24039e3f4f4e49da3 10.42.2.61:6379@16379 myself,master - 0 1721306661000 5 connected 10923-16383 8a63ab6b6e6b1db63855afdf604646e7f0145348 10.42.1.60:6379@16379 master,fail - 1721306312230 1721306309711 1 connected 06809780e7800808a217eafb35f8cee395f51820 10.42.0.92:6379@16379 master - 0 1721306662000 2 connected 5462-10922
发现有一个节点fail了,而在redis-0中执行则发现只有自己一个节点。
再尝试下停掉其他master节点,也是如此。难道网上教程又不靠谱了?经过我的思考,大概知道是为什么了,原因出现在存储上,原文中每个节点的/data/redis.conf都是共用的,作者使用了一个nfs的卷再结合ReadWriteMany将存储共用。而我这里采用的是临时卷,所以节点重启后读取不到原先的/data/redis.conf就会自己创建导致加入不了集群。
那咋办呢,结合前面的ceph-rbd创建PVC吧。走起。这里其实不一定要共用存储,每个节点创建一个单独的存储也是ok的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 apiVersion: apps/v1 kind: StatefulSet metadata: name: redis spec: serviceName: redis replicas: 6 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: terminationGracePeriodSeconds: 20 containers: - name: redis image: redis:7.0 args: - "/conf/redis.conf" ports: - name: redis containerPort: 6379 protocol: "TCP" - name: cluster containerPort: 16379 protocol: "TCP" volumeMounts: - name: redis-conf mountPath: /conf/ - name: redis-data mountPath: /data volumes: - name: redis-conf configMap: name: redis-config-map items: - key: redis.conf path: redis.conf volumeClaimTemplates: - metadata: name: redis-data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "csi-rbd-sc" resources: requests: storage: 1Gi --- apiVersion: v1 kind: Service metadata: name: redis spec: selector: app: redis clusterIP: None ports: - port: 6379 name: redis ---
再重新重建集群。这里不再赘述。
删除redis-0,查看集群是否能自恢复。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 root@redis-0:/data 127.0.0.1:6379> cluster nodes 4cf82b83963d3a5178d4d73db2080934e9d5f0a1 10.42.1.65:6379@16379 slave 33207c0188005e553000a456fb2d41f9d341eb05 0 1721308687541 3 connected 1c978c4a36bb5f954411b2a965e2258915b4dfdc 10.42.0.95:6379@16379 slave 31e0da6c88f0d4954123b6d7df2de37967938696 0 1721308686531 2 connected 3181437aa3fd71108a6c1616de15fa8e42d6add7 10.42.1.64:6379@16379 myself,slave cf9eb6aa26822f32decf9f3047c7cb606f677a1a 0 1721308685000 4 connected 31e0da6c88f0d4954123b6d7df2de37967938696 10.42.0.94:6379@16379 master - 0 1721308686000 2 connected 0-5461 33207c0188005e553000a456fb2d41f9d341eb05 10.42.2.66:6379@16379 master - 0 1721308686000 3 connected 10923-16383 cf9eb6aa26822f32decf9f3047c7cb606f677a1a 10.42.2.67:6379@16379 master - 0 1721308686000 4 connected 5462-10922 127.0.0.1:6379> role 1) "slave" 2) "10.42.2.67" 3) (integer ) 6379 4) "connected" 5) (integer ) 224 127.0.0.1:6379>
欸,发现redis-0成功变为了slave。
至此,cluster集群的搭建就完成了。
总结 这篇文章整体下来,写了好几天,原本以为哨兵模式也可以简单搭建,没想到也是有点复杂的。中途还有面试和复盘要做,断断续续的。不过好在也算是了解了一些redis高可用集群的搭建。