一、配置高可用集群的前提:(以两节点的heartbeat为例)
⑴时间必须保持同步
⑵节点之间必须用名称互相通信
建议使用/etc/hosts,而不要用DNS
集群中使用的主机名为`uname -n`表示的主机名;
⑶ping node(仅偶数节点才需要)
⑷ssh密钥认证进行无障碍通信;
二、heartbeat v1的配置
程序主配置文件:ha.cf
认证密钥:authkeys, 其权限必须为组和其它无权访问;
资源配置文件:haresources
/usr/share/doc/heartbeat-VERSION 目录中有此三个文件的模板,可将其复制到/etc/ha.d/目录下
三、ha.cf文件部分参数详解
logfile /var/log/ha-log #指定heartbaet的日志存放位置
keepalive 2 #指定心跳信息间隔时间为2秒
deadtime 30 #指定备用节点在30秒内没有收到主节点的心跳信息后,则立即接管主节点的服务资源
warntime 10 #指定心跳延迟的时间为十秒。当10秒钟内备份节点不能接收到主节点的心跳信号时,就会往日志中写入一个警告日志,但此时不会切换服务
initdead 120 #在某些系统上,系统启动或重启之后需要经过一段时间网络才能正常工作,该选项用于解决这种情况产生的时间间隔。取值至少为deadtime的两倍。
udpport 694 #694为默认使用的端口号。
baud 19200 #设置串行通信的波特率
#bcast eth0 # Linux #以广播方式通过eth0传递心跳信息
#mcast eth0 225.0.0.1 694 1 0 #以组播方式通过eth0传递心跳信息,一般在备用节点不止一台时使用。Bcast、ucast和mcast分别代表广播、单播和多播,是组织心跳的三种方式,任选其一即可。
#ucast eth0 192.168.1.2 #以单播方式通过eth0传递心跳信息,后面跟的IP地址应为双机对方的IP地址
auto_failback on #用来定义当主节点恢复后,是否将服务自动切回,heartbeat的两台主机分别为主节点和备节点。主节点在正常情况下占用资源并运行所有的服务,遇到故障时把资源交给备节点并由备节点运行服务。在该选项设为on的情况下,一旦主节点恢复运行,则自动获取资源并取代备节点,如果该选项设置为off,那么当主节点恢复后,将变为备节点,而原来的备节点成为主节点
#stonith baytech /etc/ha.d/conf/stonith.baytech
#watchdog /dev/watchdog #该选项是可选配置,是通过Heartbeat来监控系统的运行状态。使用该特性,需要在内核中载入"softdog"内核模块,用来生成实际的设备文件,如果系统中没有这个内核模块,就需要指定此模块,重新编译内核。编译完成输入"insmod softdog"加载该模块。然后输入"grep misc /proc/devices"(应为10),输入"cat /proc/misc |grep watchdog"(应为130)。最后,生成设备文件:"mknod /dev/watchdog c 10 130" 。即可使用此功能
node node1.magedu.com node2.magedu.com #要做高可用的节点名称,可以通过命令“uname –n”查看。
ping 192.168.12.237 #ping节点地址,ping节点选择的越好,HA集群就越强壮,可以选择固定的路由器作为ping节点,但是最好不要选择集群中的成员作为ping节点,ping节点仅仅用来测试网络连接
ping_group group1 192.168.12.120 192.168.12.237 #ping组
apiauth pingd gid=haclient uid=hacluster
respawn hacluster /usr/local/ha/lib/heartbeat/pingd -m 100 -d 5s
#该选项为可选配置,列出与heartbeat一起启动和关闭的进程,该进程一般是和heartbeat集成的插件,这些进程遇到故障可以自动重新启动。最常用的进程是pingd,此进程用于检测和监控网卡状态,需要配合ping语句指定的ping node来检测网络的连通性。其中hacluster表示启动pingd进程的身份。
#下面的配置是关键,也就是激活crm管理,开始使用v2 style格式
# crm respawn #还可以使用crm on/yes的写法,但这样写的话,如果后面的cib.xml配置有问题会导致heartbeat直接重启该服务器,所以,测试时建议使用respawn的写法
#下面是对传输的数据进行压缩,是可选项
compression bz2
compression_threshold 2
四、案例:基于heartbeat v1配置mysql和httpd的高可用双主模型,二者使用nfs共享数据
1、实验环境:
node4: 192.168.30.14,mysqld主节点,httpd备节点
node5: 192.168.30.15, httpd主节点,mysql备节点
node3: 192.168.30.20,nfs
mysql高可用所需的资源:
ip: 192.168.30.100
mysqld
nfs:/mydata
httpd高可用所需的资源:
ip: 192.168.30.101
httpd
nfs:/web
2、准备工作
⑴让节点之间的时间同步,并能使用名称进行无障碍通信
ntpdate 202.120.2.101
vim /etc/hosts
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[[email protected] ~]# ntpdate 202.120.2.101 13 Apr 23:08:47 ntpdate[2613]: the NTP socket is in use, exiting [[email protected] ~]# date Wed Apr 13 23:09:25 CST 2016 [[email protected] ~]# crontab -e */10 * * * * /usr/sbin/ntpdate 202.120.2.101 [[email protected] ~]# vim /etc/hosts #编辑本地hosts文件 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.30.10 node1 192.168.30.20 node2 192.168.30.13 node3 192.168.30.14 node4 192.168.30.15 node5 [[email protected] ~]# scp /etc/hosts [email protected]:/etc/ The authenticity of host ‘node5 (192.168.30.15)‘ can‘t be established. RSA key fingerprint is a3:d3:a0:9d:f0:3b:3e:53:4e:ee:61:87:b9:3a:1c:8c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added ‘node5,192.168.30.15‘ (RSA) to the list of known hosts. [email protected]‘s password: hosts 100% 262 0.3KB/s 00:00 [[email protected] ~]# ping node5 PING node5 (192.168.30.15) 56(84) bytes of data. 64 bytes from node5 (192.168.30.15): icmp_seq=1 ttl=64 time=0.419 ms 64 bytes from node5 (192.168.30.15): icmp_seq=2 ttl=64 time=0.706 ms ^C --- node5 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1888ms rtt min/avg/max/mdev = 0.419/0.562/0.706/0.145 ms [[email protected] ~]# ssh-keygen -t rsa #生成密码对 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: bc:b8:e6:78:6d:51:91:30:4d:d4:dd:50:c0:18:f1:28 [email protected] The key‘s randomart image is: +--[ RSA 2048]----+ | o=ooo*o=.| | .+ ooo .| | E.. . | | . .. | | S. | | ... | | .... | | .o.o | | .+o. | +-----------------+ [[email protected] ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected] #将公钥信息导入对方节点的认证文件中 The authenticity of host ‘node5 (192.168.30.15)‘ can‘t be established. RSA key fingerprint is a3:d3:a0:9d:f0:3b:3e:53:4e:ee:61:87:b9:3a:1c:8c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added ‘node5,192.168.30.15‘ (RSA) to the list of known hosts. [email protected]‘s password: Now try logging into the machine, with "ssh ‘[email protected]‘", and check in: .ssh/authorized_keys to make sure we haven‘t added extra keys that you weren‘t expecting. [[email protected] ~]# ssh [email protected] hostname #连接对方节点不需要输入密码了 node5
#在另一个节点上执行同样的步骤 [[email protected] ~]# ntpdate 202.120.2.101 ... [[email protected] ~]# crontab -e */10 * * * * /usr/sbin/ntpdate 202.120.2.101 [[email protected] ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.30.10 node1 192.168.30.20 node2 192.168.30.13 node3 192.168.30.14 node4 192.168.30.15 node5 [[email protected] ~]# ssh-keygen -t rsa ... [[email protected] ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected] ... [[email protected] ~]# ssh [email protected] hostname node4
⑵提供一个nfs服务器,共享两个目录,一个给mysql服务,一个给httpd服务
vim /etc/exports
/mydata 192.168.30.0/24(rw,no_root_squash)
/web 192.168.30.0/24(rw)
[[email protected] ~]# mkdir -p /mydata/{data,binlogs} /web [[email protected] ~]# vim /web/index.html hello [[email protected] ~]# ls /mydata binlogs data [[email protected] ~]# useradd -r mysql [[email protected] ~]# id mysql uid=27(mysql) gid=27(mysql) groups=27(mysql) [[email protected] ~]# useradd -r apache [[email protected] ~]# id apache uid=48(apache) gid=48(apache) groups=48(apache) [[email protected] ~]# chown -R mysql.mysql /mydata [[email protected] ~]# setfacl -R -m u:apache:rwx /web [[email protected] ~]# vim /etc/exports /mydata 192.168.30.0/24(rw,no_root_squash) /web 192.168.30.0/24(rw) [[email protected] ~]# service rpcbind status rpcbind (pid 1337) is running... [[email protected] ~]# service nfs start Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] Starting RPC idmapd: [ OK ]
⑶在两个节点上安装好要做高可用的服务程序
[[email protected] ~]# useradd -u 27 -r mysql [[email protected] ~]# useradd -u 48 -r apache [[email protected] ~]# yum -y install mysql-server httpd ... [[email protected] ~]# chkconfig mysqld off [[email protected] ~]# chkconfig httpd off [[email protected] ~]# vim /etc/my.cnf [mysqld] datadir=/mydata/data socket=/var/lib/mysql/mysql.sock user=mysql log-bin=/mydata/binlogs/mysql-bin innodb_file_per_table=ON # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 skip-name-resolve [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid [[email protected] ~]# scp /etc/my.cnf [email protected]:/etc/ my.cnf 100% 308 0.3KB/s [[email protected] ~]# mkdir /mydata [[email protected] ~]# showmount -e 192.168.30.13 Export list for 192.168.30.13: /web 192.168.30.0/24 /mydata 192.168.30.0/24 [[email protected] ~]# mount -t nfs 192.168.30.13:/mydata /mydata [[email protected] ~]# service mysqld start Initializing MySQL database: Installing MySQL system tables... OK Filling help tables... OK [[email protected] ha.d]# mysql ... mysql> grant all on *.* to [email protected]‘192.168.30.%‘ identified by ‘magedu‘; Query OK, 0 rows affected (0.04 sec) mysql> flush privileges; Query OK, 0 rows affected (0.01 sec) mysql> \q Bye [[email protected] ~]# cd /mydata [[email protected] mydata]# ls binlogs data [[email protected] mydata]# ls data ibdata1 ib_logfile0 ib_logfile1 mysql test [[email protected] mydata]# ls binlogs mysql-bin.000001 mysql-bin.000002 mysql-bin.000003 mysql-bin.index [[email protected] mydata]# cd [[email protected] ~]# service mysqld stop Stopping mysqld: [ OK ] [[email protected] ~]# umount /mydata
另一节点执行类似步骤,只是不需要再次执行mysql初始化
[[email protected] ~]# vim /etc/exports #当mysql初始化完毕后,就可将no_root_squash选项去掉 /mydata 192.168.30.0/24(rw) /web 192.168.30.0/24(rw)
⑷在每个节点上安装heartbeat并配置好资源
本例中安装heartbeat v2,v2兼容v1,可使用haresources作为配置接口。
说明:
①heartbeat-pils不要使用yum安装,否则会被自动更新成cluter-glue,而cluster-glue跟heartbeat v2不兼容
②/usr/share/doc/heartbeat-VERSION 目录中有ha.cf、haresources和authkey的模板,可将其复制到/etc/ha.d/目录下:
cp /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/
③/etc/ha.d/resource.d目录中是一些资源代理
IPADDR:使用ifconfig命令配置ip
IPADDR2:使用ip addr命令配置ip(需要使用ip addr show命令查看)
④/usr/lib64/heartbeat目录中是一些功能脚本
hb_standby:把当前节点变成备节点
hb_takeover:接管资源
ha_propagate:将ha.cf和authkeys复制到其它节点,会自动保持权限
send_arp:任何时候把地址抢过来就要通知前端路由更新arp缓存
haresources2cib.py:将haresources转换成cib格式,输出至/var/lib/heartbeat/crm/
[[email protected] ~]# rpm -ivh heartbeat-pils-2.1.4-12.el6.x86_64.rpm ... [[email protected] ~]# yum -y install PyXML libnet perl-TimeDate ... [[email protected] ~]# rpm -ivh heartbeat-stonith-2.1.4-12.el6.x86_64.rpm heartbeat-2.1.4-12.el6.x86_64.rpm Preparing... ########################################### [100%] 1:heartbeat-stonith ########################################### [ 50%] 2:heartbeat ########################################### [100%] [[email protected] ~]# ls /usr/share/doc/heartbeat-2.1.4/ apphbd.cf ChangeLog DirectoryMap.txt GettingStarted.html HardwareGuide.html hb_report.html heartbeat_api.txt Requirements.html rsync.txt authkeys COPYING faqntips.html GettingStarted.txt HardwareGuide.txt hb_report.txt logd.cf Requirements.txt startstop AUTHORS COPYING.LGPL faqntips.txt ha.cf haresources heartbeat_api.html README rsync.html [[email protected] ~]# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/ [[email protected] ~]# cd /etc/ha.d [[email protected] ha.d]# ls authkeys ha.cf harc haresources rc.d README.config resource.d shellfuncs [[email protected] ha.d]# ls resource.d/ apache db2 Filesystem ICP IPaddr IPsrcaddr LinuxSCSI LVSSyncDaemonSwap OCF Raid1 ServeRAID WinPopup AudibleAlarm Delay hto-mapfuncs ids IPaddr2 IPv6addr LVM MailTo portblock SendArp WAS Xinetd [[email protected] ha.d]# vim ha.cf #编辑配置文件,设置如下几项,其它采用默认设置即可 ... logfile /var/log/ha-log #同时关闭logfacility local0 ... auto_failback on mcast eth0 225.1.1.1 694 1 0 node node4 node5 ping 192.168.30.2 ... [[email protected] ha.d]# openssl rand -hex 10 392fa6f47a05ed67a0f7 [[email protected] ha.d]# vim authkeys ... auth 1 1 sha1 392fa6f47a05ed67a0f7 [[email protected] ha.d]# chmod 600 authkeys [[email protected] ha.d]# vim haresources ... node4 192.168.30.100/24/eth0/192.168.30.255 Filesystem::192.168.30.13:/mydata::/mydata::nfs mysqld node5 192.168.30.101/24/eth0/192.168.30.255 Filesystem::192.168.30.13:/web::/var/www/html::nfs httpd [[email protected] ha.d]# scp -p authkeys ha.cf haresources [email protected]:/etc/ha.d/ authkeys 100% 680 0.7KB/s 00:00 ha.cf 100% 10KB 10.3KB/s 00:00 haresources 100% 6105 6.0KB/s 00:00 [[email protected] ha.d]# service heartbeat start;ssh [email protected] ‘service heartbeat start‘ Starting High-Availability services: 2016/04/14_02:38:06 INFO: Resource is stopped 2016/04/14_02:38:06 INFO: Resource is stopped Done. Starting High-Availability services: 2016/04/14_02:38:05 INFO: Resource is stopped 2016/04/14_02:38:06 INFO: Resource is stopped Done. [[email protected] ha.d]# ifconfig ... eth0:0 Link encap:Ethernet HWaddr 00:0C:29:32:52:1C inet addr:192.168.30.100 Bcast:192.168.30.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 ... [[email protected] ha.d]# service mysqld status mysqld (pid 4094) is running... [[email protected] ha.d]# service httpd status httpd is stopped
[[email protected] ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:96:45:92 inet addr:192.168.30.15 Bcast:192.168.30.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe96:4592/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21716 errors:0 dropped:0 overruns:0 frame:0 TX packets:11535 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:25422433 (24.2 MiB) TX bytes:922617 (900.9 KiB) eth0:0 Link encap:Ethernet HWaddr 00:0C:29:96:45:92 inet addr:192.168.30.101 Bcast:192.168.30.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:48 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3120 (3.0 KiB) TX bytes:3120 (3.0 KiB) [[email protected] ~]# service httpd status httpd (pid 3820) is running... [[email protected] ~]# service mysqld status mysqld is stopped
⑸客户端测试:
[[email protected] ~]# curl 192.168.30.101 hello [[email protected] ~]# mysql -u root -h 192.168.30.100 -p ... mysql> create database hellodb; Query OK, 1 row affected (0.03 sec) mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hellodb | | mysql | | test | +--------------------+ 4 rows in set (0.00 sec)
⑹模拟资源资源转移
[[email protected] ~]# /usr/lib64/heartbeat/hb_standby 2016/04/14_07:07:52 Going standby [all]. [[email protected] ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:32:52:1C inet addr:192.168.30.14 Bcast:192.168.30.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe32:521c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:91672 errors:0 dropped:0 overruns:0 frame:0 TX packets:87427 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:56564639 (53.9 MiB) TX bytes:38563809 (36.7 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:48 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3120 (3.0 KiB) TX bytes:3120 (3.0 KiB)
[[email protected] ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:96:45:92 inet addr:192.168.30.15 Bcast:192.168.30.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe96:4592/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:61508 errors:0 dropped:0 overruns:0 frame:0 TX packets:50930 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:33075217 (31.5 MiB) TX bytes:8763944 (8.3 MiB) eth0:0 Link encap:Ethernet HWaddr 00:0C:29:96:45:92 inet addr:192.168.30.101 Bcast:192.168.30.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth0:1 Link encap:Ethernet HWaddr 00:0C:29:96:45:92 inet addr:192.168.30.100 Bcast:192.168.30.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:48 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3120 (3.0 KiB) TX bytes:3120 (3.0 KiB)
mysql> select version(); ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 2 Current database: *** NONE *** +------------+ | version() | +------------+ | 5.1.73-log | +------------+ 1 row in set (0.13 sec)