drbd概述
Distributed Replicated Block Device(DRBD)是一种基于软件的,无共享,复制的存储解决方案,在服务器之间的对块设备(硬盘,分区,逻辑卷等)进行镜像。
DRBD工作在内核当中的,类似于一种驱动模块。DRBD工作的位置在文件系统的buffer cache和磁盘调度器之间,通过tcp/ip发给另外一台主机到对方的tcp/ip最终发送给对方的drbd,再由对方的drbd存储在本地对应磁盘 上,类似于一个网络RAID-1功能。
在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵。本地(主节点)与远程主机(备节点)的数据可以保 证实时同步。当本地系统出现故障时,远程主机上还会保留有一份相同的数据,可以继续使用。
DRBD的架构如下图所示:
drbd的安装
前提:
1)本配置共有两个测试节点,分别node1.samlee.com和node2.samlee.com,相的IP地址分别为172.16.100.6和172.16.100.7;
2)node1和node2两个节点上各提供了一个大小相同的分区作为drbd设备;我们这里为在两个节点上均为/dev/sda3,大小为5G;
3)系统为CentOS 6.5,x86_64平台;
1、准备工作
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:
# vim /etc/hosts 172.16.100.6 node1.magedu.com node1 172.16.100.7 node2.magedu.com node2
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
Node1配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node1.samlee.com
Node2配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node2.samlee.com
2)设定两个节点可以基于密钥进行ssh通信,这可以通过如下的命令实现:
Node1配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh node2.samlee.com ‘date‘;date
Node2配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh node1.samlee.com ‘date‘;date
3)设置5分钟自动同步时间(node1、node2都需要配置)
# crontab -e */5 * * * * /sbin/ntpdata 172.16.100.10 &> /dev/null
4)关闭selinux(node1、node2都需要配置)
# setenforce 0 # vim /etc/selinux/config SELINUX=disabled
2、软件包介绍
drbd共有两部分组成:内核模块和用户空间的管理工具。其中drbd内核模块代码已经整合进Linux内核2.6.33以后的版本中,因此,如果您的内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保持对应。
目前适用CentOS 5的drbd版本主要有8.0、8.2、8.3三个版本,其对应的rpm包的名字分别为drbd, drbd82和drbd83,对应的内核模块的名字分别为kmod-drbd, kmod-drbd82和kmod-drbd83。而适用于CentOS 6的版本为8.4,其对应的rpm包为drbd和drbd-kmdl,但在实际选用时,要切记两点:drbd和drbd-kmdl的版本要对应;另一个是drbd-kmdl的版本要与当前系统的内容版本相对应。各版本的功能和配置等略有差异;我们实验所用的平台为x86_64且系统为CentOS 6.4,因此需要同时安装内核模块和管理工具。我们这里选用最新的8.4的版本(drbd-8.4.3-33.el6.x86_64.rpm和drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm),下载地址为ftp://rpmfind.net/linux/atrpms/,请按照需要下载。
实际使用中,您需要根据自己的系统平台等下载符合您需要的软件包版本,这里不提供各版本的下载地址。
3、软件包安装
下载完成后直接安装即可:
# rpm -ivh drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm
4.配置drbd
drbd的主配置文件为/etc/drbd.conf;为了管理的便捷性,目前通常会将些配置文件分成多个部分,且都保存至/etc/drbd.d目录中,主配置文件中仅使用"include"指令将这些配置文件片断整合起来。通常,/etc/drbd.d目录中的配置文件为global_common.conf和所有以.res结尾的文件。其中global_common.conf中主要定义global段和common段,而每一个.res的文件用于定义一个资源。
在配置文件中,global段仅能出现一次,且如果所有的配置信息都保存至同一个配置文件中而不分开为多个文件的话,global段必须位于配置文件的最开始处。目前global段中可以定义的参数仅有minor-count, dialog-refresh, disable-ip-verification和usage-count。
common段则用于定义被每一个资源默认继承的参数,可以在资源定义中使用的参数都可以在common段中定义。实际应用中,common段并非必须,但建议将多个资源共享的参数定义为common段中的参数以降低配置文件的复杂度。
resource段则用于定义drbd资源,每个资源通常定义在一个单独的位于/etc/drbd.d目录中的以.res结尾的文件中。资源在定义时必须为其命名,名字可以由非空白的ASCII字符组成。每一个资源段的定义中至少要包含两个host子段,以定义此资源关联至的节点,其它参数均可以从common段或drbd的默认中进行继承而无须定义。
下面的操作在node1.samlee.com上完成。
(1)配置/etc/drbd.d/global-common.conf
# vim /etc/drbd.d/global_common.conf global { usage-count no; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { #wfc-timeout 120; #degr-wfc-timeout 120; } disk { on-io-error detach; #fencing resource-only; } net { cram-hmac-alg "sha1"; shared-secret "mydrbdlab"; } syncer { rate 1000M; } }
(2)创建共享存储分区大小为5G(node1,node2都必须执行分区)。
# fdisk /dev/sda WARNING: DOS-compatible mode is deprecated. It‘s strongly recommended to switch off the mode (command ‘c‘) and change display units to sectors (command ‘u‘). Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 3 First cylinder (7675-15665, default 7675): Using default value 7675 Last cylinder, +cylinders or +size{K,M,G} (7675-15665, default 15665): +5G Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8) Syncing disks. #重新读取内核分区 # partx -a /dev/sda # cat /proc/partitions major minor #blocks name 8 0 125829120 sda 8 1 204800 sda1 8 2 61440000 sda2 8 3 5248836 sda3
(3)定义资源文件/etc/drbd.d/mystore.res,内容如下:
# vim /etc/drbd.d/mystore.res resource mystore { on node1.samlee.com { device /dev/drbd0; disk /dev/sda3; address 172.16.100.6:7789; meta-disk internal; } on node2.samlee.com { device /dev/drbd0; disk /dev/sda3; address 172.16.100.7:7789; meta-disk internal; } }
以上文件在两个节点上必须相同,因此,可以基于ssh将刚才配置的文件全部同步至另外一个节点。
# scp /etc/drbd.d/* node2:/etc/drbd.d/
(4)在两个节点上初始化已定义的资源并启动服务:
1)初始化资源,在Node1和Node2上分别执行:
# drbdadm create-md web
2)启动服务,在Node1和Node2上分别执行:
# service drbd start
3)查看启动状态:
# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-11-29 12:28:00 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:5248636
也可以使用drbd-overview命令来查看:
# drbd-overview 0:mystore/0 Connected Secondary/Secondary Inconsistent/Inconsistent C r-----
从上面的信息中可以看出此时两个节点均处于Secondary状态。于是,我们接下来需要将其中一个节点设置为Primary。在要设置为Primary的节点上执行如下命令:
# drbdadm primary --force mystore
注: 也可以在要设置为Primary的节点上使用如下命令来设置主节点:
# drbdadm -- --overwrite-data-of-peer primary mystore
而后再次查看状态,可以发现数据同步过程已经开始:
# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-11-29 12:28:00 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n- ns:3689532 nr:0 dw:0 dr:3694240 al:0 bm:225 lo:3 pe:3 ua:7 ap:0 ep:1 wo:f oos:1561212 [=============>......] sync‘ed: 70.3% (1524/5124)M finish: 0:00:42 speed: 36,400 (40,080) K/sec
等数据同步完成以后再次查看状态,可以发现节点已经牌实时状态,且节点已经有了主次:
# drbd-overview 0:mystore/0 Connected Primary/Secondary UpToDate/UpToDate C r----- ### Primary/Secondary --左边显示当前主机状态,右边显示其他节点状态
(5)创建文件系统
文件系统的挂载只能在Primary节点进行,因此,也只有在设置了主节点后才能对drbd设备进行格式化,下面的操作在node1.samlee.com上完成:
# mke2fs -t ext4 /dev/drbd0 # mkdir /mydata # mount /dev/drbd0 /mydata/ # cd /mydata/ # vim node1.conf Weblcome to node1.....
(6)切换Primary和Secondary节点
对主Primary/Secondary模型的drbd服务来讲,在某个时刻只能有一个节点为Primary,因此,要切换两个节点的角色,只能在先将原有的Primary节点设置为Secondary后,才能原来的Secondary节点设置为Primary:
Node1操作--降级为备节点:
# umount /mydata/ # drbdadm secondary mystore
查看节点状态:
# drbd-overview 0:mystore/0 Connected Secondary/Secondary UpToDate/UpToDate C r-----
Node2操作--提升为主节点:
# drbdadm primary mystore # mkdir /mydata # mount /dev/drbd0 /mydata/
使用下面的命令查看在此前在主节点上创建文件是否存在,再新建文件node2.conf:
# ls /mydata/ lost+found node1.conf # cat /mydata/node1.conf Weblcome to node1..... # vim /mydata/node2.conf Welcome to Node2....
测试如下:
Node2操作--降级为备节点:
# umount /mydata/ # drbdadm secondary mystore
查看节点状态:
# drbd-overview 0:mystore/0 Connected Secondary/Secondary UpToDate/UpToDate C r-----
Node1操作--提升为主节点:
# drbdadm primary mystore # mount /dev/drbd0 /mydata/
使用下面的命令查看在此前在主节点上创建文件是否存在:
# ls /mydata/ lost+found node1.conf node2.conf
Pacemaker+drbd实现高可用共享存储
前提:
1)本配置共有两个测试节点,分别node1.samlee.org和node2.samlee.org,相的IP地址分别为172.16.100.6和172.16.100.7;
2)node1和node2两个节点已经配置好了基于corosync的集群;且node1和node2也已经配置好了Primary/Secondary模型的drbd设备/dev/drbd0,且对应的资源名称为mystore;如果您此处的配置有所不同,请确保后面的命令中使用到时与您的配置修改此些信息与您所需要的配置保持一致;
3)停止drbd服务,将所有节点降级为备用节点
# drbdadm secondary mystore # drbd-overview 0:mystore/0 Connected Secondary/Secondary UpToDate/UpToDate C r----- # chkconfig drbd off # service drbd stop
4)系统为系统为CentOS 6.5,x86_64平台;
实现过程如下:
1、查看当前集群的配置信息,确保已经配置全局属性参数为两节点集群所适用:
# crm configure show node node1.samlee.com node node2.samlee.com property $id="cib-bootstrap-options" dc-version="1.1.10-14.el6-368c726" cluster-infrastructure="classic openais (with plugin)" expected-quorum-votes="2" # crm configure property stonith-enabled=false # crm configure property no-quorum-policy=ignore # crm configure rsc_defaults resource-stickiness=100 # crm configure show node node1.samlee.com node node2.samlee.com property $id="cib-bootstrap-options" dc-version="1.1.10-14.el6-368c726" cluster-infrastructure="classic openais (with plugin)" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" rsc_defaults $id="rsc-options" resource-stickiness="100"
2、将已经配置好的drbd设备/dev/drbd0定义为集群服务;
1)按照集群服务的要求,首先确保两个节点上的drbd服务已经停止,且不会随系统启动而自动启动:
# drbd-overview 0:web Unconfigured . . . . # chkconfig drbd off
2)配置drbd为集群资源:
提供drbd的RA目前由OCF归类为linbit,其路径为/usr/lib/ocf/resource.d/linbit/drbd。我们可以使用如下命令来查看此RA及RA的meta信息:
查询集群资源类型:
# crm ra classes lsb ocf / heartbeat linbit pacemaker service stonith
查询集群资源类型下包含的小类
# crm ra list ocf linbit drbd
查询集群资源类型drbd使用帮助
Manages a DRBD device as a Master/Slave resource (ocf:linbit:drbd) This resource agent manages a DRBD resource as a master/slave resource. DRBD is a shared-nothing replicated storage device. Note that you should configure resource level fencing in DRBD, this cannot be done from this resource agent. See the DRBD User‘s Guide for more information. http://www.drbd.org/docs/applications/ Parameters (* denotes required, [] the default): drbd_resource* (string): drbd resource name The name of the drbd resource from the drbd.conf file. drbdconf (string, [/etc/drbd.conf]): Path to drbd.conf Full path to the drbd.conf file. stop_outdates_secondary (boolean, [false]): outdate a secondary on stop Recommended setting: until pacemaker is fixed, leave at default (disabled). Note that this feature depends on the passed in information in OCF_RESKEY_CRM_meta_notify_master_uname to be correct, which unfortunately is not reliable for pacemaker versions up to at least 1.0.10 / 1.1.4. If a Secondary is stopped (unconfigured), it may be marked as outdated in the drbd meta data, if we know there is still a Primary running in the cluster. Note that this does not affect fencing policies set in drbd config, but is an additional safety feature of this resource agent only. You can enable this behaviour by setting the parameter to true. If this feature seems to not do what you expect, make sure you have defined fencing policies in the drbd configuration as well. Operations‘ defaults (advisory minimum): start timeout=240 promote timeout=90 demote timeout=90 notify timeout=90 stop timeout=100 monitor_Slave timeout=20 interval=20 monitor_Master timeout=20 interval=10
drbd需要同时运行在两个节点上,但只能有一个节点(primary/secondary模型)是Master,而另一个节点为Slave;因此,它是一种比较特殊的集群资源,其资源类型为多态(Multi-state)clone类型,即主机节点有Master和Slave之分,且要求服务刚启动时两个节点都处于slave状态。
# crm configure crm(live)configure# primitive mysqlstore ocf:linbit:drbd params drbd_resource=mystore op monitor role=Master interval=30s timeout=30s op monitor role=Slave interval=60s timeout=20s op start timeout=240s op stop timeout=100s crm(live)configure# verify crm(live)configure# master ms_mysqlstore mysqlstore meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify="true" crm(live)configure# verify crm(live)configure# commit --查询配置 # crm configure show node node1.samlee.com node node2.samlee.com primitive mysqlstore ocf:linbit:drbd params drbd_resource="mystore" op monitor role="Master" interval="30s" timeout="30s" op monitor role="Slave" interval="60s" timeout="20s" op start timeout="240s" interval="0" op stop timeout="100s" interval="0" ms ms_mysqlstore mysqlstore meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" property $id="cib-bootstrap-options" dc-version="1.1.10-14.el6-368c726" cluster-infrastructure="classic openais (with plugin)" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" rsc_defaults $id="rsc-options" resource-stickiness="100"
查询当前集群运行状态:
crm(live)# status Last updated: Thu Aug 18 12:26:47 2016 Last change: Thu Aug 18 12:24:02 2016 via cibadmin on node1.samlee.com Stack: classic openais (with plugin) Current DC: node1.samlee.com - partition with quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ node1.samlee.com node2.samlee.com ] Master/Slave Set: ms_mysqlstore [mysqlstore] Masters: [ node1.samlee.com ] Slaves: [ node2.samlee.com ]
由上面的信息可以看出此时的drbd服务的Primary节点为node1.samlee.com,Secondary节点为node2.samlee.com。当然,也可以在node1上使用如下命令验正当前主机是否已经成为web资源的Primary节点:
# drbdadm role mystore Primary/Secondary
ms_mystore的Master节点即为drbd服务web资源的Primary节点,此节点的设备/dev/drbd0可以挂载使用,且在某集群服务的应用当中也需要能够实现自动挂载。假设我们这里的web资源是为Web服务器集群提供网页文件的共享文件系统,其需要挂载至/mydata(此目录需要在两个节点都已经建立完成)目录。
此外,此自动挂载的集群资源需要运行于drbd服务的Master节点上,并且只能在drbd服务将某节点设置为Primary以后方可启动。因此,还需要为这两个资源建立排列约束和顺序约束。
crm(live)configure# primitive mysqlfs ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext4 op monitor interval=30s timeout=40s op start timeout=60s op stop timeout=60s on-fail=restart crm(live)configure# verify crm(live)configure# colocation mysqlfs_with_ms_mysqlstore_master inf: mysqlfs ms_mysqlstore:Master crm(live)configure# verify crm(live)configure# order mysqlfs_after_ms_mysqlstore_master mandatory: ms_mysqlstore:promote mysqlfs:start crm(live)configure# verify crm(live)configure# show node node1.samlee.com node node2.samlee.com primitive mysqlfs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/mydata" fstype="ext4" op monitor interval="30s" timeout="40s" op start timeout="60s" interval="0" op stop timeout="60s" on-fail="restart" interval="0" primitive mysqlstore ocf:linbit:drbd params drbd_resource="mystore" op monitor role="Master" interval="30s" timeout="30s" op monitor role="Slave" interval="60s" timeout="20s" op start timeout="240s" interval="0" op stop timeout="100s" interval="0" ms ms_mysqlstore mysqlstore meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation mysqlfs_with_ms_mysqlstore_master inf: mysqlfs ms_mysqlstore:Master order mysqlfs_after_ms_mysqlstore_master inf: ms_mysqlstore:promote mysqlfs:start property $id="cib-bootstrap-options" dc-version="1.1.10-14.el6-368c726" cluster-infrastructure="classic openais (with plugin)" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" rsc_defaults $id="rsc-options" resource-stickiness="100" --提交配置 crm(live)configure# commit
查询集群状态如下:
crm(live)# status Last updated: Thu Aug 18 13:27:28 2016 Last change: Thu Aug 18 13:25:31 2016 via cibadmin on node1.samlee.com Stack: classic openais (with plugin) Current DC: node1.samlee.com - partition with quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ node1.samlee.com node2.samlee.com ] Master/Slave Set: ms_mysqlstore [mysqlstore] Masters: [ node1.samlee.com ] Slaves: [ node2.samlee.com ] mysqlfs (ocf::heartbeat:Filesystem): Started node1.samlee.com
查看drbd运行状态:
[[email protected] ~]# drbd-overview 0:mystore/0 Connected Primary/Secondary UpToDate/UpToDate C r----- /mydata ext4 5.0G 139M 4.6G 3% [[email protected] ~]# ls /mydata/ lost+found node1.conf node2.conf