系统环境
系统:CentOS6.6
系统位数 x86_64
软件环境
heartbeat-3.0.4-2
drbd-8.4.3
nfs-utils-1.2.3-26
部署环境
角色 IP
VIP 192.168.1.13(内网提供服务的地址)
data-09.com br0:192.168.1.9
data-11.com br0:192.168.1.11
1、DRBD 篇
注意:DRBD可使用硬盘、分区、逻辑卷,但不能建立文件系统
1)、安装依赖包
[[email protected] ~]# yum install gcc gcc-c++ make glibc flex kernel-devel kernel-headers
2)、安装drbd
[[email protected] ~]#wget http://oss.linbit.com/drbd/8.4/drbd-8.4.3.tar.gz
[[email protected] ~]#tar zxvf drbd-8.4.3.tar.gz
[[email protected] ~]#cd drbd-8.4.3
[[email protected] ~]#./configure --prefix=/usr/local/tdoa/drbd --with-km
[[email protected] ~]#make KDIR=/usr/src/kernels/2.6.32-279.el6.x86_64/
[[email protected] ~]#make install
[[email protected] ~]#mkdir -p /usr/local/tdoa/drbd/var/run/drbd
[[email protected] ~]#cp /usr/local/tdoa/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d
[[email protected] ~]#加载DRBD模块:
[[email protected] ~]# modprobe drbd
3)、配置DRBD
主备节点两端配置文件完全一致
[[email protected] ~]#cat /usr/local/drbd/etc/drbd.conf
resource r0{
protocol C;
startup { wfc-timeout 0; degr-wfc-timeout 120;}
disk { on-io-error detach;}
net{
timeout 60;
connect-int 10;
ping-int 10;
max-buffers 2048;
max-epoch-size 2048;
}
syncer { rate 100M;}
on data-09.com{
device /dev/drbd0;
disk /dev/data/data_lv;
address 192.168.1.9:7788;
meta-disk internal;
}
on data-11.com{
device /dev/drbd0;
disk /dev/data/data_lv;
address 192.168.1.11:7788;
meta-disk internal;
}
}
4)、初始化drbd的r0资源并启动
在两个节点都要做的操作
[[email protected] ~]# drbdadm create-md r0
[[email protected] ~]# drbdadm up r0
查看data-09.com和data-11.com的状态应该类似下面的:
[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2014-02-26 07:26:07
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
5)、将data-09.com提升为主节点并设置启动
[[email protected] ~]# drbdadm primary --force r0
查看data-09.com的状态应该类似下面的:
[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2014-02-26 07:28:26
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:4 nr:0 dw:4 dr:681 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
注意:DRBD服务需要开机自启动
2.NFS篇
yum install nfs-utils portmap -y 安装NFS服务
vim /etc/exports
/usr/local/tdoa/data/attach 192.168.100.0/24(rw,no_root_squash)
/usr/local/tdoa/data/attachment 192.168.100.0/24 (rw,no_root_squash)
service rpcbind restart
service nfs restart
chkconfig rpcbind on
chkconfig nfs off
service nfs stop
测试NFS可被前端Web服务器挂载并可写后停止NFS服务
3.Mysql篇
1.建立高可用目录/usr/local/data
data5 目录用于数据库文件
2.heartbeat主修改Mysql数据库存放目录至/usr/local/data/data5
3.主heartbeat和备heartbeat服务器上的Mysql安装完毕后切换DRBD分区切换至备机,备机的Mysql是否正常工作。
将主机降级为备机
[[email protected] /]# drbdadm secondary r0
[[email protected] /]# cat /proc/drbd
在备机data-11.com上, 将它升级为”主机”.
[[email protected]/]# drbdadm primary r0
4、heartbeat篇
(1.1)、YUM安装heartbeat
[[email protected] ~]#wget http://mirrors.sohu.com/fedora-epel/6Server/x86_64/epel-release-6-8.noarch.rpm
[[email protected] ~]# rpm -ivh epel-release-6-8.noarch.rpm
[[email protected] ~]# yum install heartbeat -y
(1.2)、RPM安装heartbeat
1.yum install "liblrm.so.2()(64bit)"
2.rpm -ivh PyXML-0.8.4-19.el6.x86_64.rpm
3.rpm -ivh perl-TimeDate-1.16-13.el6.noarch.rpm
4.rpm -ivh resource-agents-3.9.5-12.el6_6.1.x86_64.rpm
5.rpm -ivh cluster-glue-1.0.5-6.el6.x86_64.rpm
6.rpm -ivh cluster-glue-libs-1.0.5-6.el6.x86_64.rpm
7.rpm -ivh heartbeat-libs-3.0.4-2.el6.x86_64.rpm heartbeat-3.0.4-2.el6.x86_64.rpm
备注:heartbeat-libs和heartbeat要一起安装
(2)、配置heartbeat
主备节点两端的配置文件(ha.cf authkeys haresources)完全相同
cp /usr/share/doc/heartbeat-3.0.4/ha.cf /etc/ha.d/
cp /usr/share/doc/heartbeat-3.0.4/haresources /etc/ha.d/
cp /usr/share/doc/heartbeat-3.0.4/authkeys /etc/ha.d/
vim /etc/ha.d/ha.cf
#############################################
logfile /var/log/ha-log #日志目录
logfacility local0 #日志级别
keepalive 2 #心跳检测间隔
deadtime 5 #死亡时间
ucast eth3 75.0.2.33 #心跳网卡及对方的IP(备机仅修改此处)
auto_failback off #主服务器正常后,资源转移至主
node oa-mysql.com oa-haproxy.com #两个节点的主机名
###############################################################################
vim /etc/ha.d/authkeys #心跳密码文件权限必须是600
######################
auth 3 #选用算法3,MD5算法
#1 crc
#2 sha1 HI!
3 md5 heartbeat
######################
vim /etc/ha.d/
#########################################################################
data-09.com IPaddr::192.168.1.13/24/br0 drbddisk::r0 Filesystem::/dev/drbd0::/usr/local/data::ext4 mysql nfs
注释:主服务器的主机名 VIP/绑定的网卡 drbd分区:drbd分区挂载目录:文件系统 mysql服务 NFS服务
(5)、创建drbddisk nfs mysql脚本并授予执行权限(三个资源管理脚本需存放在ha.d/resource.d)
[[email protected] ~]#cat /etc/ha.d/resource.d/drbddisk
##################################################################
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
if [ -f $DEFAULTFILE ]; then
. $DEFAULTFILE
fi
if [ "$#" -eq 2 ]; then
RES="$1"
CMD="$2"
else
RES="all"
CMD="$1"
fi
## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
# http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
# http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####
drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi
dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi
if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}
case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don‘t lie to your cluster manager.
# And don‘t do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
status)
if [ "$RES" = "all" ]; then
echo "A resource name is required for status inquiries."
exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly. So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 # LSB status "service status is unknown"
;;
esac
;;
*)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac
exit 0
##############################################################
[[email protected] ~]#cat /etc/ha.d/resrouce.d/nfs
killall -9 nfsd; /etc/init.d/nfs restart;exit 0
mysql启动脚本用mysql自带的启动管理脚本即可
cp /etc/init.d/mysql /etc/ha.d/resrouce.d/
注意:nfs mysql drbddisk 三个脚本需要+x 权限
(6)、启动heartbeat
[[email protected] ~]# service heartbeat start (两个节点同时启动)
[[email protected] ~]# chkconfig heartbeat off
说明:关闭开机自启动,当服务器重启时,需要人工去启动
5、测试
在在另外一台LINUX的客户端挂载虚IP:192.168.7.90,挂载成功表明NFS+DRBD+HeartBeat大功告成.
测试DRBD+HeartBeat+NFS可用性:
1.向挂载的/tmp目录传送文件,忽然重新启动主端DRBD服务器,查看变化能够实现断点续传,但是drbd+heartbeat正常切换需要时间
2. 假设此时把primary的eth0 给ifdown了, 然后直接在secondary上进行主的提升,并也给mount了, 发现在primary上测试拷入的文件确实同步过来了。之后把primary的 eth0 恢复后, 发现没有自动恢复主从关系, 经过支持查询,发现出现了drbd检测出现了Split-Brain 的状况, 两个节点各自都standalone了,故障描术如下:Split-Brain detected, dropping connection!这个即时传说中的脑裂了,DRBD官方推荐手动恢复(生产环境下出现这个机率的机会很低的,谁会去故障触动生产中的服务器)
以下手动恢复Split-Brain状况:
1. drbdadm secondary r0
2. drbdadm disconnect all
3. drbdadmin -- --discard-my-data connect r0
ii.在primary上:
1. drbdadm disconnect all
2. drbdadm connect r0
3. 假设Primary因硬件损坏了,需要将Secondary提生成Primay主机,如何处理,方法如下:
在primaty主机上,先要卸载掉DRBD设备.
umount /tmp
将主机降级为备机
[[email protected] /]# drbdadm secondary r0
[[email protected] /]# cat /proc/drbd
1: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r—
现在,两台主机都是”备机”.
在备机data-11.com上, 将它升级为”主机”.
[[email protected]/]# drbdadm primary r0
[[email protected] /]# cat /proc/drbd
1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r—
已知问题:
heartbeat无法监控资源,也就是说当drbd或者nfs挂掉了,也不会发生任何动作,它只认为对方的机器dead之后才会发生动作。也就是机器宕机,网络断掉才会发生主备切换,因此有另外一种方案:corosync+pacemaker