Redis异构集群数据实时迁移

背景

由于历史原因,公司的缓存方案使用的是Codis,并且一个大部门公用一个集群,我们计划废弃Codis,用redis原生的集群架构,为什么要废弃Codis呢,主要有两个原因:1、Codis官方已经很久没有更新维护了,Redis官方版本已经迭代到5.x.x了,codis-server还是3.x.x,Redis的一些新特性无法支持;2、基于风险均摊、鸡蛋不放一个篮子的原则,目前我们这样的用法违背了这一原则,如果一个集群出问题,那么整个部门的全部服务都受影响。在前期和业务部门调研的过程中发现,大家用Codis不仅仅是做缓存,有些业务场景还当储存用,比如计数器等;所以我们需要一个数据实时迁移方案,这样业务才能无感知的从Codis迁移到Redis。

创新互联主要从事成都网站建设、做网站、网页设计、企业做网站、公司建网站等业务。立足成都服务铁东,十余年网站建设经验,价格优惠、服务专业,欢迎来电咨询建站服务:18982081108

方案选型

需求

1、支持从Codis到Redis Cluster做数据迁移
2、支持从Codis到哨兵集群做数据迁移
3、支持只迁移部分key
4、支持查看迁移进度

调研

1、redis-migrate-tool
redis-migrate-tool是唯品会开源的一款Redis异构集群之间的数据实时迁移工具,不过已经有两年没有更新了,我个人觉得这是一款比较完善的工具,特别是数据校验,详细功能介绍见GitHub:
https://github.com/vipshop/redis-migrate-tool
2、RedisShake
RedisShake是阿里云基于豌豆荚开源的redis-port进行二次开发的一个支持Redis异构集群实时同步的工具,它和redis-migrate-tool相比较,我觉得它的优点在于支持前缀key的同步,支持多DB同步,而redis-migrate-tool 只能全量同步,并且如果源做了分库,同步到目标Redis的时候都同步到了db0一个库里面了,这对于做了分库场景的业务是不可行的,关于RedisShake的详细功能介绍见GitHub:
https://github.com/alibaba/RedisShake

3、redis-port
redis-port是豌豆荚当年为了让大家方便从redis迁移到Codis开源的一个Redis数据迁移工具,现在也已经很久没更新了,关于它的功能也用法见GitHub:
https://github.com/CodisLabs/redis-port

实践

环境

codis---》哨兵

分片master密码codis版本哨兵地址master地址master密码哨兵redis版本
192.168.46.150:10379 xxx 3.2.4 192.168.9.87:6385 192.168.9.87:6384 123456 5.0.2
192.168.47.150:10379 xxx 3.2.4 192.168.9.88:6385 192.168.9.87:6384 123456 5.0.2
xxx 3.2.4 192.168.9.89:6385 192.168.9.87:6384 123456 5.0.2

codis---》Redis Cluster

分片master密码codis版本master nodemaster密码redis cluster版本
192.168.46.150:10379 xxx 3.2.4 192.168.9.87:6383 123456 5.0.2
192.168.47.150:10379 xxx 3.2.4 192.168.9.89:6382 123456 5.0.2
xxx 3.2.4 192.168.9.88:6381 123456 5.0.2

使用redis-migrate-tool进行数据迁移

迁移工具安装

按官方文档进行编译安装即可

编写配置文件

迁移哨兵的配置文件

vim /chj/app/redis-migrate-tool/rmt_sentinel.conf
[source]
type: single
servers :
- 192.168.46.150:10379
- 192.168.47.150:10379
redis_auth: xxx

[target]
type: single
servers:
- 192.168.9.87:6384
redis_auth: 123456

[common]
listen: 0.0.0.0:8888

迁移redis cluster的配置文件

vim /chj/app/redis-migrate-tool/rmt_cluster.conf

[source]
type: single
servers :
- 192.168.46.150:10379
- 192.168.47.150:10379
redis_auth: xxx

[target]
type: redis cluster
servers:
- 192.168.9.87:6383
- 192.168.9.89:6382
- 192.168.9.88:6381
redis_auth: 123456

[common]
listen: 0.0.0.0:8889

启动同步程序

cd /chj/app/redis-migrate-tool
#condis迁移数据到哨兵集群
./src/redis-migrate-tool -c rmt_sentinel.conf -o rmt.log -d 
#condis迁移数据到redis cluster
./src/redis-migrate-tool -c rmt_cluster.conf -o rmt_cluster.log -d

数据校验

cd /chj/app/redis-migrate-tool
[root@devops-template-test redis-migrate-tool]# ./src/redis-migrate-tool -c rmt_sentinel.conf -C "redis_check 60000"
Check job is running...
[2019-06-25 11:12:09.414] rmt_check.c:848 ERROR: key checked failed: check key's value error, value is inconsistent. key(len:17, type:hash): BigData-IpParse:4

Checked keys: 60000
Inconsistent value keys: 1
Inconsistent expire keys : 0
Other check error keys: 0
Checked OK keys: 59999

Check job finished, used 16.622s
PS
1、"-C "redis_check 60000" 指定要执行数据校验,60000指的是校验数据的样本数,默认是1000
2、如果有异常,需要确认执行异常key的情况

同步状态确认

total_msgs_outqueue可以判断是否有oplog在队列中等待处理,如果total_msgs_outqueue>0,请继续等待,直到total_msgs_outqueue=0才能切换

[root@devops-template-test redis-migrate-tool]# redis-cli -h 127.0.0.1 -p 8889 info
 Server
version:0.1.0
os:Linux 3.10.0-693.5.2.el7.x86_64 x86_64
multiplexing_api:epoll
gcc_version:4.8.5
process_id:10137
tcp_port:8889
uptime_in_seconds:1201
uptime_in_days:0
config_file:/chj/app/redis-migrate-tool/rmt_cluster.conf

 Clients
connected_clients:1
max_clients_limit:100
total_connections_received:1

 Memory
mem_allocator:jemalloc-0.0.0

 Group
source_nodes_count:2
target_nodes_count:4

Stats
all_rdb_received:1
all_rdb_parsed:1
all_aof_loaded:0
rdb_received_count:2
rdb_parsed_count:2
aof_loaded_count:0
total_msgs_recv:357666
total_msgs_sent:357666
total_net_input_bytes:78804395
total_net_output_bytes:1688068278
total_net_input_bytes_human:75.15M
total_net_output_bytes_human:1.57G
total_mbufs_inqueue:0
total_msgs_outqueue:0

使用RedisShake进行数据迁移

工具安装

mkdir /chj/app/redis-shake
cd  /chj/app/redis-shake
wget https://github.com/alibaba/RedisShake/releases/download/release-v1.6.9-20190624/redis-shake.tar.gz
tar -zxvf redis-shake.tar.gz

编写配置文件

在原来的配置文件上修改,只修改下面有注释的项,其他保持不变

id = redis-shake
log.file = ./redis-shake.log
log.level = info
pid_path =
system_profile = 9310
http_profile = 9320
ncpu = 0
parallel = 32
source.type = cluster  #源类型选择cluster
source.address = 192.168.46.150:10379;192.168.47.150:10379  #codis 分片master的地址
source.password_raw = xxx #codis的密码
source.auth_type = auth
source.tls_enable = false
target.type = sentinel  #目标的类型是哨兵
#target.type = cluster #目标是redis cluster
target.address = sentinel-zhj2-redis-sentinel-dev-6384@192.168.9.87:6385;192.168.9.88:6385;192.168.9.89:6385 #目标哨兵集群的地址
#target.address = 192.168.9.87:6383;192.168.9.89:6382;192.168.9.88:6381 #目标redis cluster的地址
target.password_raw = 123456 #目标redis的密码
target.auth_type = auth
target.db = -1
target.tls_enable = false
rdb.input = local
rdb.output = local_dump
rdb.parallel = 0
rdb.special_cloud =
fake_time =
rewrite = true
filter.db = 0 #只同步db0
filter.key =mms;vcc #只同步mms和vcc开头的key
filter.slot =
filter.lua = false
big_key_threshold = 524288000
psync = false
metric = true
metric.print_log = false
heartbeat.url =
heartbeat.interval = 3
heartbeat.external = test external
heartbeat.network_interface =
sender.size = 104857600
sender.count = 5000
sender.delay_channel_size = 65535
keep_alive = 0
scan.key_number = 50
scan.special_cloud =
scan.key_file =
qps = 200000
replace_hash_tag = false
extra = false

启动同步程序

/chj/app/redis-shake/start.sh /chj/app/redis-shake/redis-shake.conf sync

查看同步状态

通过比较PullCommandTotal - BypassCommandTotal == PushCommandTotal 确定同步是否完成

curl  http://192.168.47.253:9320/metric| python -m json.tool
[
    {
        "AvgDelay": "0.43 ms",
        "BypassCmdCount": 0,
        "BypassCmdCountTotal": 0,
        "Delay": "null ms",
        "Details": null,
        "FailCmdCount": 0,
        "FailCmdCountTotal": 0,
        "FullSyncProgress": 100,
        "NetworkFlowTotal": 42006,
        "NetworkSpeed": 0,
        "ProcessingCmdCount": 0,
        "PullCmdCount": 0,
        "PullCmdCountTotal": 897,
        "PushCmdCount": 0,
        "PushCmdCountTotal": 839,
        "SenderBufCount": 0,
        "SourceAddress": "192.168.46.150:10379",
        "SourceDBOffset": 0,
        "StartTime": "2019-06-25T17:45:23Z",
        "Status": "incr",
        "SuccessCmdCount": 0,
        "SuccessCmdCountTotal": 839,
        "TargetAddress": [
            "192.168.9.87:6384"
        ],
        "TargetDBOffset": 0
    },
    {
        "AvgDelay": "0.60 ms",
        "BypassCmdCount": 1,
        "BypassCmdCountTotal": 4067,
        "Delay": "null ms",
        "Details": null,
        "FailCmdCount": 0,
        "FailCmdCountTotal": 0,
        "FullSyncProgress": 100,
        "NetworkFlowTotal": 37629,
        "NetworkSpeed": 0,
        "ProcessingCmdCount": 0,
        "PullCmdCount": 1,
        "PullCmdCountTotal": 5106,
        "PushCmdCount": 0,
        "PushCmdCountTotal": 333,
        "SenderBufCount": 0,
        "SourceAddress": "192.168.47.150:10379",
        "SourceDBOffset": 0,
        "StartTime": "2019-06-25T17:45:23Z",
        "Status": "incr",
        "SuccessCmdCount": 0,
        "SuccessCmdCountTotal": 333,
        "TargetAddress": [
            "192.168.9.87:6384"
        ],
        "TargetDBOffset": 0
    }
]

当前题目:Redis异构集群数据实时迁移
当前路径:http://csdahua.cn/article/gegdij.html
扫二维码与项目经理沟通

我们在微信上24小时期待你的声音

解答本文疑问/技术咨询/运营咨询/技术建议/互联网交流