Zer0e's Blog

【架构之路5】搭建redis集群

字数统计: 4.9k阅读时长: 25 min
2024/07/16 Share

前言

竟然被问到了redis集群了不了解,我除了主从其他还没搭建过,这篇就搭建下redis的主从模式,哨兵模式,Cluster模式。

正文

主从模式

这个模式老简单了,读写分离嘛。看我快速写出k8s编排文件。一主一从。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-master-deployment
labels:
app: redis-master-deployment
spec:
replicas: 1
selector:
matchLabels:
app: redis-master
template:
metadata:
name: redis-master
labels:
app: redis-master
spec:
containers:
- name: redis-master
image: redis:7.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 6379
args: ["--requirepass", "test"]
restartPolicy: Always

---
apiVersion: v1
kind: Service
metadata:
name: redis-master-service-nodeport
spec:
selector:
app: redis-master
type: NodePort
ports:
- port: 6379
targetPort: 6379

---
apiVersion: v1
kind: Service
metadata:
name: redis-master-service
spec:
selector:
app: redis-master
type: ClusterIP
ports:
- port: 6379
targetPort: 6379

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-slave-deployment
labels:
app: redis-slave-deployment
spec:
replicas: 1
selector:
matchLabels:
app: redis-slave
template:
metadata:
name: redis-slave
labels:
app: redis-slave
spec:
containers:
- name: redis-slave
image: redis:7.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 6379
args: ["--requirepass", "test", "--replicaof", "redis-master-service", "6379", "--masterauth", "test"]
restartPolicy: Always

---
apiVersion: v1
kind: Service
metadata:
name: redis-slave-service
spec:
selector:
app: redis-slave
type: NodePort
ports:
- port: 6379
targetPort: 6379

这里能优化的是把配置挪到configMap中。再加节点也是一样的,可以把slave副本拉上去,也可以多加一个service。

哨兵模式

这个模式用k8s部署坑还是比较多的,参考了网上的一些文章。但是使用起来也是有一些问题。

坑点:

  1. sentinel集群外部连接比较困难,当然这个可以配合ktconnect来做。
  2. sentinel的配置文件无法直接挂载configmap,因为启动sentinel是会检查配置文件是否可写。。。这个看了下执行日志发现会自动更改配置文件。
  3. 由2可知,不想改镜像的话得动态写入配置文件到卷中。

参照网上的教程,我也照猫画虎搞了一下。

先是配置secret,存储redis和sentinel的密码,这里密码都是同一个。

1
2
3
4
5
6
7
kind: Secret
apiVersion: v1
metadata:
name: redis-secret
type: Opaque
stringData:
REDIS_PASSWORD: "test"

redis的配置文件和初始化脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config-map
data:
REDIS_NODES: "redis-0.redis,redis-1.redis,redis-2.redis"

redis.conf: |
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 10
tcp-keepalive 30
daemonize no
supervised no
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile ""
databases 16
always-show-logo yes
save ""
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
rdb-del-sync-files no
dir "/data"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
appendonly no
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128

---
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-init-script-config
data:
sentinel_init.sh: |
#! /bin/bash

for i in ${REDIS_NODES//,/ }
do
echo "find master at $i"
MASTER=$(redis-cli --no-auth-warning --raw -h $i -a ${REDIS_PASSWORD} info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2)
if [ "${MASTER}" == "" ]; then
echo "no master found"
MASTER=
else
echo "found ${MASTER}"
break
fi
done

echo "sentinel resolve-hostnames yes" >> /etc/redis/sentinel.conf
echo "sentinel announce-hostnames yes" >> /etc/redis/sentinel.conf
echo "sentinel monitor mymaster ${MASTER} 6379 2" >> /etc/redis/sentinel.conf
echo "sentinel auth-pass mymaster ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel down-after-milliseconds mymaster 5000" >> /etc/redis/sentinel.conf
echo "sentinel sentinel-pass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel parallel-syncs mymaster 1" >> /etc/redis/sentinel.conf
echo "sentinel failover-timeout mymaster 10000" >> /etc/redis/sentinel.conf
echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel announce-ip ${HOSTNAME}.sentinel" >> /etc/redis/sentinel.conf

cat /etc/redis/sentinel.conf

redis_init.sh: |
#! /bin/bash

cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
echo "masterauth ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
echo "replica-announce-ip ${HOSTNAME}.redis" >> /etc/redis/redis.conf
echo "replica-announce-port 6379 " >> /etc/redis/redis.conf
echo "finding master..."

if [ "$(timeout 5 redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping)" != "PONG" ]; then

echo "sentinel not found, defaulting to redis-0"

if [ ${HOSTNAME} == "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
echo "slave-read-only no" >> /etc/redis/redis.conf
echo "slaveof redis-0.redis 6379" >> /etc/redis/redis.conf
fi

else

echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} sentinel get-master-addr-by-name mymaster | grep -E '(^redis-*)|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"

if [ "${HOSTNAME}.redis" == ${MASTER} ]; then
echo "this is master, not updating config..."
else
echo "master found : ${MASTER}, updating redis.conf"
echo "slave-read-only no" >> /etc/redis/redis.conf
echo "slaveof ${MASTER} 6379" >> /etc/redis/redis.conf
echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
fi

fi
---

这里稍微解释一下这个初始化脚本。首先讲讲redis_init.sh吧,首先我们会挂载 redis-config-map到/tmp/redis目录,把配置文件拷贝到/etc/redis/redis.conf作为我们真正的配置文件。然后就开始判断当前是否有sentinel且当前节点是否为主节点了。redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping是检查sentinel是否可用。而下面的MASTER则是判断当前节点是否是主节点。如果不是主节点,那么就添加slaveof到主节点,这里有个问题,就是slave-read-only应不应该是no。这里我尝试了一下,非master节点如果可写的话,其实同步不到主节点去。

然后说说sentinel_init.sh。先遍历所有redis节点,然后找出master节点,然后把配置文件写到/etc/redis/sentinel.conf中供后续使用。

然后是redis节点的部署。采用statefulSet部署。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
labels:
app: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
name: redis
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-secret
key: REDIS_PASSWORD
command: ["sh", "-c", "/scripts/redis_init.sh"]
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: init-scripts
mountPath: /scripts/
- name: origin-config
mountPath: /tmp/redis
containers:
- name: redis
image: redis:7.0
imagePullPolicy: IfNotPresent
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: data
mountPath: /data
volumes:
- name: data
emptyDir: {}
- name: redis-config
emptyDir: {}
- name: init-scripts
configMap:
name: redis-init-script-config
defaultMode: 0777
items:
- key: redis_init.sh
path: redis_init.sh
- name: origin-config
configMap:
name: redis-config-map
items:
- key: redis.conf
path: redis.conf
restartPolicy: Always
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
---

apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
app: redis
clusterIP: None
ports:
- port: 6379
name: redis
---


apiVersion: v1
kind: Service
metadata:
name: redis-0-node-port
spec:
selector:
statefulset.kubernetes.io/pod-name: redis-0
type: NodePort
ports:
- port: 6379
targetPort: 6379

---

apiVersion: v1
kind: Service
metadata:
name: redis-1-node-port
spec:
selector:
statefulset.kubernetes.io/pod-name: redis-1
type: NodePort
ports:
- port: 6379
targetPort: 6379

---

apiVersion: v1
kind: Service
metadata:
name: redis-2-node-port
spec:
selector:
statefulset.kubernetes.io/pod-name: redis-2
type: NodePort
ports:
- port: 6379
targetPort: 6379

---

解读一下。首先在初始化容器中,我们挂载了三个目录redis-config init-scripts origin-config 。其中redis-config是一个临时卷,前面我们说过,redis和sentinel启动时要求配置文件可写,所以这里用的临时卷,init-scripts是初始化脚本,origin-config是redis的基础配置文件。init容器的作用就是将origin-config中的配置文件拷贝到redis-config中,并根据init-scripts中的redis_init.sh脚本生成配置文件到redis-config中,提供给真正容器使用。而真正容器中需要挂载的就是redis-config和data目录,data是redis的数据目录,生产环境中应配合pv和pvc实现持久化,这里采用临时卷过渡下。

随后定义几个service,没啥好说的,这里是把三个节点的redis分别用nodePort暴露出来了,每个节点都能连接。

然后是sentinel服务。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sentinel
spec:
serviceName: sentinel
replicas: 3
selector:
matchLabels:
app: sentinel
template:
metadata:
labels:
app: sentinel
spec:
initContainers:
- name: config
image: redis:7.0
env:
- name: REDIS_NODES
valueFrom:
configMapKeyRef:
name: redis-config-map
key: REDIS_NODES
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-secret
key: REDIS_PASSWORD
command: ["sh", "-c", "/scripts/sentinel_init.sh"]
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: init-script
mountPath: /scripts/
containers:
- image: redis:7.0
name: sentinel
command: ["redis-sentinel"]
args: ["/etc/redis/sentinel.conf"]
ports:
- name: sentinel
containerPort: 26379
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: data
mountPath: /data
volumes:
- name: init-script
configMap:
name: redis-init-script-config
defaultMode: 0777
items:
- key: sentinel_init.sh
path: sentinel_init.sh
- name: redis-config
emptyDir: {}
- name: data
emptyDir: {}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sentinel
topologyKey: kubernetes.io/hostname

---
apiVersion: v1
kind: Service
metadata:
name: sentinel
spec:
selector:
app: sentinel
clusterIP: None
ports:
- port: 26379
name: redis
---


apiVersion: v1
kind: Service
metadata:
name: sentinel-node-port
spec:
selector:
app: sentinel
type: NodePort
ports:
- port: 26379
targetPort: 26379
---

逻辑和redis差不多,这里便不再赘述。唯一有问题的地方是,连接到sentinel后,可以知道主节点的host,这里应该是redis-0.redis,但是由于我们不再集群内,所以dns域名解析失败,外部连接的话可能得配置ktconnect。这里我测试了下,当把主节点的pod删除后,sentinel会自动选举出新节点,当然这里也有一个问题,在主节点下线后的极短时间内,sentinel依旧会认为主节点是当前下线的节点,导致下线节点重新被拉起时,是以主节点配置启动的,但是当主容器还未启动时,sentinel更改了master,那么此时重新拉起的容器配置文件会被sentinel增加以下内容,导致节点变为只读。这与原先作者的本意不同。这里可以改进把每个节点配置文件增加slave-read-only no,即无论是否是主节点都增加这个。

1
2
3
# Generated by CONFIG REWRITE
replicaof redis-1.redis 6379
latency-tracking-info-percentiles 50 99 99.9

其次是既有主节点又有其他节点,那么通过sentinel获取时从节点时,也可进行数据添加,极大概率增加了数据的不可靠性。

因此最终我改造了一下,把redis配置文件稍微改了下,从节点就应该不允许写。其实就是将echo "slave-read-only no" >> /etc/redis/redis.conf删除。此时只有主节点才能写,这才符合主从规范。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-init-script-config
data:
sentinel_init.sh: |
#! /bin/bash

for i in ${REDIS_NODES//,/ }
do
echo "find master at $i"
MASTER=$(redis-cli --no-auth-warning --raw -h $i -a ${REDIS_PASSWORD} info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2)
if [ "${MASTER}" == "" ]; then
echo "no master found"
MASTER=
else
echo "found ${MASTER}"
break
fi
done

echo "sentinel resolve-hostnames yes" >> /etc/redis/sentinel.conf
echo "sentinel announce-hostnames yes" >> /etc/redis/sentinel.conf
echo "sentinel monitor mymaster ${MASTER} 6379 2" >> /etc/redis/sentinel.conf
echo "sentinel auth-pass mymaster ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel down-after-milliseconds mymaster 5000" >> /etc/redis/sentinel.conf
echo "sentinel sentinel-pass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel parallel-syncs mymaster 1" >> /etc/redis/sentinel.conf
echo "sentinel failover-timeout mymaster 10000" >> /etc/redis/sentinel.conf
echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/sentinel.conf
echo "sentinel announce-ip ${HOSTNAME}.sentinel" >> /etc/redis/sentinel.conf

cat /etc/redis/sentinel.conf

redis_init.sh: |
#! /bin/bash

cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
echo "masterauth ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
echo "replica-announce-ip ${HOSTNAME}.redis" >> /etc/redis/redis.conf
echo "replica-announce-port 6379 " >> /etc/redis/redis.conf
echo "finding master..."

if [ "$(timeout 5 redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} ping)" != "PONG" ]; then

echo "sentinel not found, defaulting to redis-0"

if [ ${HOSTNAME} == "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
echo "slaveof redis-0.redis 6379" >> /etc/redis/redis.conf
fi

else

echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 26379 -a ${REDIS_PASSWORD} sentinel get-master-addr-by-name mymaster | grep -E '(^redis-*)|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"

if [ "${HOSTNAME}.redis" == ${MASTER} ]; then
echo "this is master, not updating config..."
else
echo "master found : ${MASTER}, updating redis.conf"
echo "slaveof ${MASTER} 6379" >> /etc/redis/redis.conf
echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
fi

fi
---

因此最后使用方法就是先通过sentinel获取主节点或者从节点的地址,然后主节点写,从节点读。

总得来说,哨兵模式搭建在k8s中会比较复杂,因为涉及到有状态服务的定义,所以得配合init容器和bash脚本去控制容器启动逻辑。

Cluster模式

这个模式全自动好像比较困难,需要用到redis-trib工具。由于cluster集群必须在所有节点启动后才能进行初始化,而如果将初始化逻辑类似哨兵模式一样写入init.sh中,则是一件非常复杂而且低效的行为。

先配置config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config-map
data:

redis.conf: |
bind 0.0.0.0
protected-mode no
port 6379
cluster-enabled yes
cluster-config-file /data/redis.conf
cluster-node-timeout 5000
dir "/data"

---

然后是redis节点创建出来,并创建无头服务。(无头service不分配clusterIP, 一般用于pod实例之间相互通信,不再像普通service那样负载均衡。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 6
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
terminationGracePeriodSeconds: 20
containers:
- name: redis
image: redis:7.0
args:
- "/conf/redis.conf"
ports:
- name: redis
containerPort: 6379
protocol: "TCP"
- name: cluster
containerPort: 16379
protocol: "TCP"
volumeMounts:
- name: redis-conf
mountPath: /conf/
- name: redis-data
mountPath: /data
volumes:
- name: redis-conf
configMap:
name: redis-config-map
items:
- key: redis.conf
path: redis.conf
- name: redis-data
emptyDir: {}

---


apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
app: redis
clusterIP: None
ports:
- port: 6379
name: redis
---

注意这里我们采用的是临时卷,生产环境data务必换成持久化卷。

并且没有换成持久化卷是有问题的,后面会讲到,可以参照后面新的yaml。

注意这里没有配置节点亲和性,原因是因为我们创建的是3主3从,没有过多的节点让我们去均匀部署。

无头服务的statefulSet会分配dns,为<pod name>.<service name>.<namespace>.svc.cluster.local 尝试dns这些服务

1
2
3
4
5
6
7
8
kubectl run --rm -i --tty busybox --image=busybox:1.28 /bin/sh
nslookup redis-0.redis

Server: 10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name: redis-0.redis
Address 1: 10.42.1.60 redis-0.redis.default.svc.cluster.local

使用一个额外的容器来初始化我们的redis集群。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
kubectl run -it ubuntu --image=ubuntu:20.04 --restart=Never /bin/bash
kubectl exec -it ubuntu /bin/bash

cat > /etc/apt/sources.list << EOF
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
EOF


apt-get update
apt-get install -y vim wget python3 python3-pip redis-tools dnsutils



pip install redis-trib

创建只有Master节点的集群
redis-trib.py create \
`dig +short redis-0.redis.default.svc.cluster.local`:6379 \
`dig +short redis-1.redis.default.svc.cluster.local`:6379 \
`dig +short redis-2.redis.default.svc.cluster.local`:6379


为每个Master添加Slave
redis-trib.py replicate \
--master-addr `dig +short redis-0.redis.default.svc.cluster.local`:6379 \
--slave-addr `dig +short redis-3.redis.default.svc.cluster.local`:6379

redis-trib.py replicate \
--master-addr `dig +short redis-1.redis.default.svc.cluster.local`:6379 \
--slave-addr `dig +short redis-4.redis.default.svc.cluster.local`:6379

redis-trib.py replicate \
--master-addr `dig +short redis-2.redis.default.svc.cluster.local`:6379 \
--slave-addr `dig +short redis-5.redis.default.svc.cluster.local`:6379

到这里我们的集群就初始化成功了。

我们可以进入其中一个redis节点看看。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
kubectl exec -it redis-0 /bin/bash

root@redis-0:/data# redis-cli -c
127.0.0.1:6379> cluster nodes
c0309e16b8d0727a4ad2cbe939ec59caac46e37d 10.42.0.93:6379@16379 slave 06809780e7800808a217eafb35f8cee395f51820 0 1721305716101 2 connected
8a63ab6b6e6b1db63855afdf604646e7f0145348 10.42.1.60:6379@16379 myself,master - 0 1721305715000 1 connected 10923-16383
956ebaec5cd6ac4b0970f823808bee6c076dcbe8 10.42.2.60:6379@16379 master - 0 1721305714591 4 connected 0-5461
06809780e7800808a217eafb35f8cee395f51820 10.42.0.92:6379@16379 master - 0 1721305714592 2 connected 5462-10922
f8effa58385f8941a193dfadbf2e90d018ca1c19 10.42.1.59:6379@16379 slave 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 0 1721305715094 4 connected
2937328a1fb97f0c21d203d24039e3f4f4e49da3 10.42.2.61:6379@16379 slave 8a63ab6b6e6b1db63855afdf604646e7f0145348 0 1721305716504 1 connected

127.0.0.1:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:4
cluster_my_epoch:1
cluster_stats_messages_ping_sent:555
cluster_stats_messages_pong_sent:549
cluster_stats_messages_meet_sent:2
cluster_stats_messages_sent:1106
cluster_stats_messages_ping_received:549
cluster_stats_messages_pong_received:557
cluster_stats_messages_received:1106
total_cluster_links_buffer_limit_exceeded:0

创建一个nodePort服务供外部访问看看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: v1
kind: Service
metadata:
name: redis-access-service
labels:
app: redis
spec:
type: NodePort
ports:
- name: redis-port
port: 6379
targetPort: 6379
selector:
app: redis

注意这里像another redis desktop manager这样的软件不能勾选cluster,因为它识别的所有节点都是内部ip。直接正常连接就行。

这里我们直接删除redis-0,观察redis-3日志

1
2
3
4
5
6
7
8
9
1:S 18 Jul 2024 12:38:37.534 * Marking node 8a63ab6b6e6b1db63855afdf604646e7f0145348 as failing (quorum reached).
1:S 18 Jul 2024 12:38:37.534 # Cluster state changed: fail
1:S 18 Jul 2024 12:38:37.563 # Start of election delayed for 919 milliseconds (rank #0, offset 1171).
1:S 18 Jul 2024 12:38:38.569 # Starting a failover election for epoch 5.
1:S 18 Jul 2024 12:38:38.576 # Failover election won: I'm the new master.
1:S 18 Jul 2024 12:38:38.576 # configEpoch set to 5 after successful failover
1:M 18 Jul 2024 12:38:38.576 * Discarding previously cached master state.
1:M 18 Jul 2024 12:38:38.576 # Setting secondary replication ID to 793c7bf0d23ad2480d10ebddeef4de92500b7f41, valid up to offset: 1172. New replication ID is d4170f01b37dbd48ab2898ff8cd2404a7025a2b7
1:M 18 Jul 2024 12:38:38.577 # Cluster state changed: ok

可以看到此时redis-3晋升为master,进入控制台后也可以使用role命令查看。

1
2
3
4
5
6
root@redis-3:/data# redis-cli
127.0.0.1:6379> role
1) "master"
2) (integer) 1171
3) (empty array)
127.0.0.1:6379>

但此时新的问题来了,redis-0重新部署后加不到集群中了。我们在redis-3中执行cluster nodes

1
2
3
4
5
6
7
8
root@redis-3:/data# redis-cli
127.0.0.1:6379> cluster nodes
956ebaec5cd6ac4b0970f823808bee6c076dcbe8 10.42.2.60:6379@16379 master - 0 1721306662601 4 connected 0-5461
c0309e16b8d0727a4ad2cbe939ec59caac46e37d 10.42.0.93:6379@16379 slave 06809780e7800808a217eafb35f8cee395f51820 0 1721306663005 2 connected
f8effa58385f8941a193dfadbf2e90d018ca1c19 10.42.1.59:6379@16379 slave 956ebaec5cd6ac4b0970f823808bee6c076dcbe8 0 1721306662000 4 connected
2937328a1fb97f0c21d203d24039e3f4f4e49da3 10.42.2.61:6379@16379 myself,master - 0 1721306661000 5 connected 10923-16383
8a63ab6b6e6b1db63855afdf604646e7f0145348 10.42.1.60:6379@16379 master,fail - 1721306312230 1721306309711 1 connected
06809780e7800808a217eafb35f8cee395f51820 10.42.0.92:6379@16379 master - 0 1721306662000 2 connected 5462-10922

发现有一个节点fail了,而在redis-0中执行则发现只有自己一个节点。

再尝试下停掉其他master节点,也是如此。难道网上教程又不靠谱了?经过我的思考,大概知道是为什么了,原因出现在存储上,原文中每个节点的/data/redis.conf都是共用的,作者使用了一个nfs的卷再结合ReadWriteMany将存储共用。而我这里采用的是临时卷,所以节点重启后读取不到原先的/data/redis.conf就会自己创建导致加入不了集群。

那咋办呢,结合前面的ceph-rbd创建PVC吧。走起。这里其实不一定要共用存储,每个节点创建一个单独的存储也是ok的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 6
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
terminationGracePeriodSeconds: 20
containers:
- name: redis
image: redis:7.0
args:
- "/conf/redis.conf"
ports:
- name: redis
containerPort: 6379
protocol: "TCP"
- name: cluster
containerPort: 16379
protocol: "TCP"
volumeMounts:
- name: redis-conf
mountPath: /conf/
- name: redis-data
mountPath: /data
volumes:
- name: redis-conf
configMap:
name: redis-config-map
items:
- key: redis.conf
path: redis.conf
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "csi-rbd-sc"
resources:
requests:
storage: 1Gi
---


apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
app: redis
clusterIP: None
ports:
- port: 6379
name: redis
---

再重新重建集群。这里不再赘述。

删除redis-0,查看集群是否能自恢复。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@redis-0:/data# redis-cli
127.0.0.1:6379> cluster nodes
4cf82b83963d3a5178d4d73db2080934e9d5f0a1 10.42.1.65:6379@16379 slave 33207c0188005e553000a456fb2d41f9d341eb05 0 1721308687541 3 connected
1c978c4a36bb5f954411b2a965e2258915b4dfdc 10.42.0.95:6379@16379 slave 31e0da6c88f0d4954123b6d7df2de37967938696 0 1721308686531 2 connected
3181437aa3fd71108a6c1616de15fa8e42d6add7 10.42.1.64:6379@16379 myself,slave cf9eb6aa26822f32decf9f3047c7cb606f677a1a 0 1721308685000 4 connected
31e0da6c88f0d4954123b6d7df2de37967938696 10.42.0.94:6379@16379 master - 0 1721308686000 2 connected 0-5461
33207c0188005e553000a456fb2d41f9d341eb05 10.42.2.66:6379@16379 master - 0 1721308686000 3 connected 10923-16383
cf9eb6aa26822f32decf9f3047c7cb606f677a1a 10.42.2.67:6379@16379 master - 0 1721308686000 4 connected 5462-10922
127.0.0.1:6379> role
1) "slave"
2) "10.42.2.67"
3) (integer) 6379
4) "connected"
5) (integer) 224
127.0.0.1:6379>

欸,发现redis-0成功变为了slave。

至此,cluster集群的搭建就完成了。

总结

这篇文章整体下来,写了好几天,原本以为哨兵模式也可以简单搭建,没想到也是有点复杂的。中途还有面试和复盘要做,断断续续的。不过好在也算是了解了一些redis高可用集群的搭建。

原文作者:Zer0e

原文链接:https://re0.top/2024/07/16/devops5/

发表日期:July 16th 2024, 9:30:00 pm

更新日期:July 18th 2024, 9:29:13 pm

版权声明:本文采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可

CATALOG
  1. 1. 前言
  2. 2. 正文
    1. 2.1. 主从模式
    2. 2.2. 哨兵模式
    3. 2.3. Cluster模式
  3. 3. 总结