(7)ceph 2 pgs inconsistent故障

[[email protected] ~]# ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 3.3e is active+clean+inconsistent, acting [11,17,4]
pg 3.42 is active+clean+inconsistent, acting [17,6,0]

官网故障解决方案:
https://ceph.com/geen-categorie/ceph-manually-repair-object/

步骤如下:
(1)找出异常的PG,然后找对对应的osd,在对应的主机上进行修复
[[email protected] /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.71826 root default
-2 3.26935 host node140
0 hdd 0.54489 osd.0 up 1.00000 1.00000
1 hdd 0.54489 osd.1 up 1.00000 1.00000
2 hdd 0.54489 osd.2 up 1.00000 1.00000
3 hdd 0.54489 osd.3 up 1.00000 1.00000
4 hdd 0.54489 osd.4 up 1.00000 1.00000
5 hdd 0.54489 osd.5 up 1.00000 1.00000
-3 3.26935 host node141
12 hdd 0.54489 osd.12 up 1.00000 1.00000
13 hdd 0.54489 osd.13 up 1.00000 1.00000
14 hdd 0.54489 osd.14 up 1.00000 1.00000
15 hdd 0.54489 osd.15 down 1.00000 1.00000
16 hdd 0.54489 osd.16 up 1.00000 1.00000
17 hdd 0.54489 osd.17 up 1.00000 1.00000
-4 2.17957 host node142
6 hdd 0.54489 osd.6 up 1.00000 1.00000
9 hdd 0.54489 osd.9 up 1.00000 1.00000
10 hdd 0.54489 osd.10 up 1.00000 1.00000
11 hdd 0.54489 osd.11 up 1.00000 1.00000

##这个命令也行
[[email protected] /]# ceph osd find 11
{
"osd": 11,
"addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.10.202.142:6820",
"nonce": 24423
},
{
"type": "v1",
"addr": "10.10.202.142:6821",
"nonce": 24423
}
]
},
"osd_fsid": "1e977e5f-f514-4eef-bd88-c3632d03b2c3",
"host": "node142",
"crush_location": {
"host": "node142",
"root": "default"
}
}

(2)对应的问题osd 11 17 ,切换到该主机,停掉osd

[[email protected] ~]# systemctl stop [email protected]

(3)将日志刷入磁盘
[[email protected] ~]# ceph-osd -i 15 --flush-journal

(4)启动osd
[[email protected] ~]# systemctl start [email protected]

(5)修复pg
[[email protected] ~]# ceph pg repair pg 3.3e

###osd 17 也同样进行修复####
(6)查看状态
[[email protected] ~]# ceph health detail
HEALTH_OK

原文地址:https://blog.51cto.com/7603402/2434815

时间: 2024-11-03 17:07:29

(7)ceph 2 pgs inconsistent故障的相关文章

ceph ( pgs inconsistent) pgs不一致 异常状态处理方式

问题描述: 在某些情况下,osd出现异常,导致pgs出现不一致状态# ceph health detailHEALTH_ERR 1 pgs inconsistent; 1 scrub errorspg 6.89 is active+clean+inconsistent, acting [12,1,10]1 scrub errors 可以看到,pg 6.89处于不一致状态 解决方式:#ceph pg repair 6.89instructing pg 6.89 on osd.12 to repai

ceph集群报错:HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

报错信息如下: [[email protected] ~]# ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors; pg 2.37c is active+clean+inconsistent, acting [75,6,35] 1 scrub errors 报错信息总结: 问题PG:2.37c OSD编号:75,6,35 执行常规修复: ceph pg repair 2.37c 查看修复结果: [[email prot

Ceph PG介绍及故障状态和修复

1 PG介绍pg的全称是placement group,中文译为放置组,是用于放置object的一个载体,pg的创建是在创建ceph存储池的时候指定的,同时跟指定的副本数也有关系,比如是3副本的则会有3个相同的pg存在于3个不同的osd上,pg其实在osd的存在形式就是一个目录,可以列出来看下: [[email protected] ~]# ll /var/lib/ceph/osd/ceph-2/current/ total 332 drwxr-xr-x 2 root root 32 Sep 1

ceph集群osd故障修复实例演示

集群安装方式:1: ceph-deploy 方式安装ceph集群,模拟osd磁盘损坏: 分别采用如下两种方式修复: 1:使用ceph-deploy 方式修复故障osd: 2:手动修复故障osd: #######使用ceph-deploy方式修复过程演示######## 1:停止osd/etc/init.d/ceph stop osd.3 2:查看osd磁盘挂载情况:[[email protected] ceph]# lsblk NAME   MAJ:MIN RM  SIZE RO TYPE MO

ceph修改pg inconsistent

异常情况 1.收到异常情况如下: HEALTH_ERR 37 scrub errors; Possible data damage: 1 pg inconsistent 2.查看详细信息 #ceph health detail HEALTH_ERR 37 scrub errors; Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 37 scrub errors PG_DAMAGED Possible data damage: 1

pg 1.277 is active+clean+inconsistent, acting 故障

ceph health detail 报错ceph health detailHEALTH_ERR 1 pgs inconsistent; 1 scrub errorspg 1.277 is active+clean+inconsistent, acting [12,1,10]1 scrub errors 解决办法[[email protected] nova]# ceph pg repair 1.277instructing pg 1.277 on osd.145 to repair[[ema

理解 OpenStack + Ceph (9): Ceph 的size/min_size/choose/chooseleaf/scrubbing/repair 等概念

本系列文章会深入研究 Ceph 以及 Ceph 和 OpenStack 的集成: (1)安装和部署 (2)Ceph RBD 接口和工具 (3)Ceph 物理和逻辑结构 (4)Ceph 的基础数据结构 (5)Ceph 与 OpenStack 集成的实现 (6)QEMU-KVM 和 Ceph RBD 的 缓存机制总结 (7)Ceph 的基本操作和常见故障排除方法 (8)基本的性能测试工具和方法 (9) pool 的size 和 min_size,choose 和 chooseleaf,pg scru

交换机死机,导致ceph ( requests are blocked ) 异常解决方法

问题描述: 万兆交换机死机后,导致在交换机上的ceph 的cluster网络会中断,用户正在对数据块的访问没有完成导致请求被blocked,同时部分pg会处于不同步状态,因此交换机重启后,通过ceph health会发现ceph集群不在OK 状态 health HEALTH_ERR 1 pgs inconsistent; 1 pgs repair; 2 requests are blocked > 32 sec; 1 scrub errorspg 6.89 is active+clean+inc

Ceph源码解析:Scrub故障检测

转载请注明出处 陈小跑 http://www.cnblogs.com/chenxianpao/p/5878159.html 本文只梳理了大致流程,细节部分还没搞的太懂,有时间再看,再补充,有错误请指正,谢谢. Ceph 的主要一大特点是强一致性,这里主要指端到端的一致性.众所周知,传统存储路径上从应用层到内核的文件系统.通用块层.SCSI层到最后的HBA和磁盘控制器,每层都有发生错误的可能性,因此传统的端到端解决方案会以数据块校验为主来解决.而在 Ceph 方面,更是加入了 Ceph 自己的客户