一、HA高可
FailOver:故障转移 包含HA Resource IP, service,STONITH
FailBack故障转移原点
Faiover domain:故障转移域
资源粘性资源更倾向于运行于哪个节点
Messagin Layer:集群事务信息层仅用来传递信息并不负责后期信息计算与比较
CRM:claster resource meanager 集群资源管理器负责统计收集集群上每一个资源状态根据资源状态资源服务本身计算出应该运行在哪个节点上。
DC:Desinated Coordinator 事务协调员
PE:Policy Engine 策略引擎是CRM一个子功能
TE:Transaction 事务引擎由它指挥
LRM:local resource manager 本地资源管理器 负责执行
资源约束Constraint
排列约束: (coloation)
资源是否能够运行于同一节点
score:
正值可以在一起
负值不能在一起
位置约束(location), score(分数)
正值倾向于此节点
负值倾向于逃离于此节点
顺序约束: (order)
定义资源启动或关闭时的次序
vip, ipvs
ipvs-->vip
资源隔离
节点级别STONITH
资源级别
例如FC SAN switch可以实现在存储资源级别拒绝某节点的访问
STONITH
split-brain: 集群节点无法有效获取其它节点的状态信息时产生脑裂
后果之一抢占共享存储
仲裁磁盘
二、案例
snn
192.168.1.5
datanode4
192.168.1.6
vip192.168.1.7
epel下有我们需要安装包
heartbeat - Heartbeat subsystem for High-Availability Linux 核心包
heartbeat-devel - Heartbeat development package 开发包
heartbeat-gui - Provides a gui interface to manage heartbeat clusters 管理heartbeat图形界面
heartbeat-ldirectord - Monitor daemon for maintaining high availability resources, 为ipvs高可用提供规则自动生成及后端realserver健康状态检查的组件
heartbeat-pils - Provides a general plugin and interface loading library 装载库和插件接口
heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head
三、前期配置
1、主机名解析
[[email protected] ~]# cat /etc/hosts
192.168.1.5 snn.abc.com snn
192.168.1.6 datanode4.abc.com datanode4
[[email protected] ~]# hostname
snn.abc.com
[[email protected] ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=snn.abc.com
2、双机互信
snn
#ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ‘‘
#ssh-copy-id -i .ssh/id_rsa.pub [email protected]
执行测试一下
[[email protected] ~]# ssh 192.168.1.6 ‘ifconfig‘
datenode4
[[email protected] ~]# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ‘‘
[[email protected] ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[[email protected] ~]# ssh 192.168.1.5 ‘ifconfig‘
3、时间同步
[[email protected] ~]# crontab -e
*/2 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/null
[[email protected] ~]# scp /var/spool/cron/root datanode4:/var/spool/cron/
四、安装heartbeat
1、解决依赖安包
[[email protected] heartbeat]# yum install perl-TimeDate PyXML libnet net-snmp-libs -y
2、只需安装这四个即可
1[[email protected] heartbeat]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
error: Failed dependencies:
libnet.so.1()(64bit) is needed by heartbeat-2.1.4-12.el6.x86_64
pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64
2解决依赖包
下载安装epel
[[email protected] heartbeat]# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
[[email protected] heartbeat]# rpm -ivh epel-release-latest-6.noarch.rpm
3安装依赖包libnet
[[email protected] heartbeat]# yum install libnet
(4)再次安装
[[email protected] heartbeat]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:heartbeat-pils ########################################### [ 25%]
2:heartbeat-stonith ########################################### [ 50%]
3:heartbeat ########################################### [ 75%]
4:heartbeat-gui ########################################### [100%]
3、6的节点scp过去
[email protected] heartbeat]# scp epel-release-latest-6.noarch.rpm heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm datanode4:/root/heartbeat/
五、配置
1、三个配置文件默认是没有的
[[email protected] ha.d]# ls /etc/ha.d/
harc rc.d README.config resource.d shellfuncs
1密钥文件600, authkeys
2heartbeat服务的配置配置ha.cf
3资源管理配置文件haresources
2、复制样例文件
[[email protected] ha.d]# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,haresources,ha.cf} ./
3、修改authkeys 600权限
[[email protected] ha.d]# chmod 600 authkeys
4、做个随机码
[[email protected] ha.d]# dd if=/dev/random count=1 bs=512 | md5sum
记录了0+1 的读入
记录了0+1 的写出
29字节(29 B)已复制8.0656e-05 秒360 kB/秒
71cc2b8ff1bd825fce13ceaea932501d -
[[email protected] ha.d]# vim authkeys
auth 1
1 md5 71cc2b8ff1bd825fce13ceaea932501d
5、核心配置文件ha.cf
ha.cf
debugfile 调试信息
logfile 日志文件
logacility
keepalive 每隔多长时间发送一次心跳信息
deadtime 多长时间替换
warnrime 警告时间
initdead 启动heartbeat时多长时间探测
udpprot 端口
bcast 广播
mcast 多播 255.0.30.1
ucast 组播
auto_failback 是否自动转回
stonith bay
ping 仲裁设备
node 节点信息不能使用ip地址
ping_group ping组
debug debug级别
compression 压缩传输算法
compression_threshold 压缩大小
验证以后要关闭服务并设置服务开机不能启动
[[email protected] ha.d]# vim ha.cf
bcast eth0 # Linux
node snn.abc.com
node datanode4.abc.com
6、两台主机都安装httpd服务
[[email protected] ha.d]# yum install httpd
[[email protected] ha.d]# echo "<h1>snn.abc.com</h1>" >> /var/www/html/index.html
验证以后要关闭服务,并设置服务开机不能启动
[[email protected] ha.d]# service httpd stop
[[email protected] ha.d]# chkconfig httpd off
[[email protected] ha.d]# chkconfig httpd off
7、定义aresources文件
先说明主节点
node1.magedu.com VIP httpd
resource.d文件夹用来定义RA
先找resource.d文件夹后找/etc/rs.d/init.d/
VIP
ip/netmask/网卡/广播地址
[[email protected] ha.d]# vim haresources
snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
8、每个节点都需要有此文件,scp -p 保存原来属性
[[email protected] ha.d]# scp -p authkeys ha.cf haresources datanode4:/etc/ha.d/
六、启动服务
[[email protected] ha.d]# service heartbeat start
[[email protected] ha.d]# ssh datanode4 ‘service heartbeat start‘
[[email protected] ha.d]# tail -f /var/log/messages
Jun 13 17:28:55 snn heartbeat: [3061]: info: Link 192.168.1.1:192.168.1.1 up.
Jun 13 17:28:55 snn heartbeat: [3061]: info: Status update for node 192.168.1.1: status ping
Jun 13 17:28:55 snn heartbeat: [3061]: info: Link snn.abc.com:eth0 up.//两个节点都up起来了
Jun 13 17:29:02 snn heartbeat: [3061]: info: Link datanode4.abc.com:eth0 up.
Jun 13 17:29:02 snn heartbeat: [3061]: info: Status update for node datanode4.abc.com: status up //检查状态信息
Jun 13 17:29:02 snn harc[3069]: info: Running /etc/ha.d/rc.d/status status
Jun 13 17:29:03 snn heartbeat: [3061]: info: Comm_now_up(): updating status to active
Jun 13 17:29:03 snn heartbeat: [3061]: info: Local status now set to: ‘active‘
Jun 13 17:29:03 snn heartbeat: [3061]: info: Status update for node datanode4.abc.com: status active
Jun 13 17:29:03 snn harc[3088]: info: Running /etc/ha.d/rc.d/status status
Jun 13 17:29:13 snn heartbeat: [3061]: info: remote resource transition completed.
Jun 13 17:29:13 snn heartbeat: [3061]: info: remote resource transition completed.
Jun 13 17:29:13 snn heartbeat: [3061]: info: Initial resource acquisition complete (T_RESOURCES(us))
Jun 13 17:29:14 snn IPaddr[3141]: INFO: Resource is stopped
Jun 13 17:29:14 snn heartbeat: [3105]: info: Local Resource acquisition completed.
Jun 13 17:29:14 snn harc[3192]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 13 17:29:14 snn ip-request-resp[3192]: received ip-request-resp IPaddr::192.168.1.7/24/eth0 OK yes
Jun 13 17:29:14 snn ResourceManager[3213]: info: Acquiring resource group: snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
Jun 13 17:29:14 snn IPaddr[3240]: INFO: Resource is stopped
Jun 13 17:29:14 snn ResourceManager[3213]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 start //资源配置start
Jun 13 17:29:14 snn IPaddr[3338]: INFO: Using calculated netmask for 192.168.1.7: 255.255.255.0
Jun 13 17:29:14 snn IPaddr[3338]: INFO: eval ifconfig eth0:0 192.168.1.7 netmask 255.255.255.0 broadcast 192.168.1.255
Jun 13 17:29:14 snn IPaddr[3309]: INFO: Success
Jun 13 17:29:14 snn ResourceManager[3213]: info: Running /etc/init.d/httpd start //http
[[email protected] ha.d]# netstat -tlunp | grep 80
tcp 0 0 :::80 :::* LISTEN 3464/httpd
[[email protected] ha.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:B1:89:48
inet addr:192.168.1.5 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb1:8948/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:35659 errors:0 dropped:0 overruns:0 frame:0
TX packets:10024 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4539049 (4.3 MiB) TX bytes:2100109 (2.0 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:B1:89:48
inet addr:192.168.1.7 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
七、利用一个脚本模拟主备切换
[[email protected] ha.d]# sh /usr/lib64/heartbeat/hb_standby
2015/06/13_17:42:27 Going standby [all].
[[email protected] ha.d]# tail -f /var/log/messages
Jun 13 17:42:28 snn ResourceManager[3568]: info: Running /etc/init.d/httpd stop
Jun 13 17:42:28 snn ResourceManager[3568]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 stop
Jun 13 17:42:29 snn IPaddr[3663]: INFO: ifconfig eth0:0 down
Jun 13 17:42:29 snn IPaddr[3634]: INFO: Success
Jun 13 17:42:29 snn heartbeat: [3555]: info: all HA resource release completed (standby).
Jun 13 17:42:29 snn heartbeat: [3061]: info: Local standby process completed [all].
Jun 13 17:42:30 snn heartbeat: [3061]: WARN: 1 lost packet(s) for [datanode4.abc.com] [819:821]
Jun 13 17:42:30 snn heartbeat: [3061]: info: remote resource transition completed.
Jun 13 17:42:30 snn heartbeat: [3061]: info: No pkts missing from datanode4.abc.com!
Jun 13 17:42:30 snn heartbeat: [3061]: info: Other node completed standby takeover of all resources. //其他节点完成备用接管所有的资源
在6这个主机下看看
[[email protected] ha.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:E1:2F:66
inet addr:192.168.1.6 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:37277 errors:0 dropped:0 overruns:0 frame:0
TX packets:3812 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5065186 (4.8 MiB) TX bytes:648956 (633.7 KiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:E1:2F:66
inet addr:192.168.1.7 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
[[email protected] ha.d]# netstat -ltunp | grep 80
tcp 0 0 :::80 :::* LISTEN 2782/httpd
八、可以通过挂载nfs的方式
1、启用另一台2.168.1.4 datanode.abc.com datanode 做nfs文件系统
[[email protected] ~]# mkdir /web/htodcs -p
2、共享的目录文件
[[email protected] ~]# vim /etc/exports
/web/htodcs 192.168.0.0/24(ro)
3、启动nfs服务
[[email protected] ~]# service nfs start
启动 NFS 服务: [确定]
关掉 NFS 配额: [确定]
启动 NFS mountd: [确定]
启动 NFS 守护进程: [确定]
正在启动 RPC idmapd: [确定]
[[email protected] ~]# showmount -e 192.168.1.4
Export list for 192.168.1.4:
/web/htodcs 192.168.0.0/24
4、来到3这台主机,先把heartbeat停掉,在改资源配置文件
[[email protected] ha.d]# ssh datanode4 ‘/etc/init.d/heartbeat stop‘
Stopping High-Availability services:
Done.
[[email protected] ha.d]# service heartbeat stop
Stopping High-Availability services:
Done.
[[email protected] ha.d]# vim haresources
[[email protected] ha.d]# mount -t nfs 192.168.1.4:/web/htdocs /mnt
[[email protected] ha.d]# mount -l | grep mnt
192.168.1.4:/web/htdocs on /mnt type nfs (rw,vers=4,addr=192.168.1.4,clientaddr=192.168.1.5)
[[email protected] ~]# cat /mnt/index.html
<h1>datanode.abc.com</h1>
测试能挂载上来,
[[email protected] ~]# umount /mnt
九、在3主机上资源管理器挂载文件系统
资源先后次序很关键
先配置IP,然后配置文件系统,再配置服务
文件系统一定在服务之前的
[[email protected] ~]# vim /etc/ha.d/ha.cf
snn.abc.com IPaddr::192.168.1.7/24/eth0 Filesystem::192.168.1.4:/web/htdocs::/var/www/html::nfs httpd
[[email protected] ~]# scp /etc/ha.d/haresources datanode4:/etc/ha.d/haresources
十、启动heartbeat后,查看日志
//有错,在查找原因