[[email protected] ~]# ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 3.3e is active+clean+inconsistent, acting [11,17,4]
pg 3.42 is active+clean+inconsistent, acting [17,6,0]
官网故障解决方案:
https://ceph.com/geen-categorie/ceph-manually-repair-object/
步骤如下:
(1)找出异常的PG,然后找对对应的osd,在对应的主机上进行修复
[[email protected] /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.71826 root default
-2 3.26935 host node140
0 hdd 0.54489 osd.0 up 1.00000 1.00000
1 hdd 0.54489 osd.1 up 1.00000 1.00000
2 hdd 0.54489 osd.2 up 1.00000 1.00000
3 hdd 0.54489 osd.3 up 1.00000 1.00000
4 hdd 0.54489 osd.4 up 1.00000 1.00000
5 hdd 0.54489 osd.5 up 1.00000 1.00000
-3 3.26935 host node141
12 hdd 0.54489 osd.12 up 1.00000 1.00000
13 hdd 0.54489 osd.13 up 1.00000 1.00000
14 hdd 0.54489 osd.14 up 1.00000 1.00000
15 hdd 0.54489 osd.15 down 1.00000 1.00000
16 hdd 0.54489 osd.16 up 1.00000 1.00000
17 hdd 0.54489 osd.17 up 1.00000 1.00000
-4 2.17957 host node142
6 hdd 0.54489 osd.6 up 1.00000 1.00000
9 hdd 0.54489 osd.9 up 1.00000 1.00000
10 hdd 0.54489 osd.10 up 1.00000 1.00000
11 hdd 0.54489 osd.11 up 1.00000 1.00000
##这个命令也行
[[email protected] /]# ceph osd find 11
{
"osd": 11,
"addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.10.202.142:6820",
"nonce": 24423
},
{
"type": "v1",
"addr": "10.10.202.142:6821",
"nonce": 24423
}
]
},
"osd_fsid": "1e977e5f-f514-4eef-bd88-c3632d03b2c3",
"host": "node142",
"crush_location": {
"host": "node142",
"root": "default"
}
}
(2)对应的问题osd 11 17 ,切换到该主机,停掉osd
[[email protected] ~]# systemctl stop [email protected]
(3)将日志刷入磁盘
[[email protected] ~]# ceph-osd -i 15 --flush-journal
(4)启动osd
[[email protected] ~]# systemctl start [email protected]
(5)修复pg
[[email protected] ~]# ceph pg repair pg 3.3e
###osd 17 也同样进行修复####
(6)查看状态
[[email protected] ~]# ceph health detail
HEALTH_OK
原文地址:https://blog.51cto.com/7603402/2434815