前提:
1)本配置共有3个测试节点,分别node1.samlee.com、node2.samlee.com和node3.samlee.com,相的IP地址分别为172.16.100.6、172.16.100.7和172.16.100.8;系统为 CentOS 6.5 X86_64bit;
2)node4.samlee.com 172.16.100.9 作为共享存储使用
3)director.samlee.com 172.16.100.3 作为RHCS管理平台使用
4)集群服务为apache的httpd服务;
5)提供web服务的地址为172.16.100.1;
6)为集群中的每个节点事先配置好yum源;
7) 额外提供了主机172.16.100.3做为ansible跳板机,以其为平台实现对集群中各节点的管理;其主机名称为director.samlee.com;
部署架构图如下所示:
1、准备工作
为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作:
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:
# vim /etc/hosts 172.16.100.6 node1.samlee.com node1 172.16.100.7 node2.samlee.com node2 172.16.100.8 node3.samlee.com node3 172.16.100.9 node4.samlee.com node4 172.16.100.3 director.samlee.com director
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
Node1配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node1.samlee.com
Node2配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node2.samlee.com
Node3配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node3.samlee.com
Node4配置:
# sed -i ‘[email protected]\(HOSTNAME=\).*@\[email protected]‘ /etc/sysconfig/network # hostname node4.samlee.com
2)设定两个节点可以基于密钥进行ssh通信,这可以通过如下的命令实现:
Node1配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # for i in {1..4};do ssh node$i ‘date‘;done
Node2配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # for i in {1..4};do ssh node$i ‘date‘;done
Node3配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # for i in {1..4};do ssh node$i ‘date‘;done
Node4配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # for i in {1..4};do ssh node$i ‘date‘;done
director配置:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] # for i in {1..4};do ssh node$i ‘date‘;done
3)设置5分钟自动同步时间(node1、node2都需要配置)
# crontab -e */5 * * * * /sbin/ntpdate 172.16.100.10 &> /dev/null
4)关闭selinux(所有节点都需要配置)
# setenforce 0 # vim /etc/selinux/config SELINUX=disabled
2、集群安装(conga配置)
RHCS的核心组件为cman和rgmanager,其中cman为基于openais的“集群基础架构层,rgmanager为资源管理器。RHCS的集群中资源的配置需要修改其主配置文件/etc/cluster/cluster.conf实现,这对于很多用户来说是比较有挑战性的,因此,RHEL提供了luci这个web管理工具,其仅安装在集群中的某一节点上即可,而cman和rgmanager需要分别安装在集群中的每个节点上。这可以在跳板机上使用ansible执行如下命令实现:
# ansible rhcs -m ping node3.samlee.com | success >> { "changed": false, "ping": "pong" } node1.samlee.com | success >> { "changed": false, "ping": "pong" } node2.samlee.com | success >> { "changed": false, "ping": "pong" } # ansible rhcs -m yum -a "name=ricci state=present"
注意:如果启用了epel源,其会通过epel安装Luci依赖的rpm包,这会被rhcs认为是不安全的。因此,安装luci时要禁用epel源:
# yum repolist # ansible webservers -m yum -a "name=ricci state=present disablerepo=epel"
安装完成后配置开机自启动:
# ansible rhcs -m service -a "name=ricci state=started enabled=yes"
检测监听端口信息:
# ansible rhcs -m shell -a "ss -tunlp | grep ricci"
RHCS集群管理平台安装:
在director主机上安装:
# yum -y install luci
注意:如果启用了epel源,其会通过epel安装Luci依赖的rpm包,这会被rhcs认为是不安全的。因此,安装luci时要禁用epel源:
# yum -y install luci --disablerepo=epel # service luci start # ss -tunlp | grep 8084 tcp LISTEN 0 5 *:8084 *:* users:(("luci",3009,5))
3、集群配置
使用网页登陆管理集群:
https:\\172.16.100.3:8084
配置所有节点ricci用户密码,使用ansible快速配置如下:
# ansible rhcs -m shell -a "echo samlee | passwd --stdin ricci"
创建集群如下所示:
失效转移域(故障转移域)配置:
定义web服务集群资源:
(1)给所有节点安装httpd服务并配置web主页:
# ansible rhcs -m yum -a "name=httpd state=present" # vim setindex.sh #!/bin/bash # echo "<h1>`uname -n`</h1>" > /var/www/html/index.html # chmod +x setindex.sh # for i in {1..3}; do scp -p setindex.sh node$i:/tmp/;done 100% 68 0.1KB/s 00:00 # ansible rhcs -m shell -a "/tmp/setindex.sh"
创建资源组同时定义资源浮点VIP和httpd服务
创建资源组:
查看集群资源运行状态:
测试如下所示:
RHCS常用命令集群管理工具:
集群配置文件:/etc/cluster/cluster.conf
1.查看集群所有节点状态信息:
# clustat Cluster Status for mycluster @ Tue Aug 23 16:46:57 2016 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.samlee.com 1 Online, Local, rgmanager node2.samlee.com 2 Online, rgmanager node3.samlee.com 3 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webservice node1.samlee.com started You have new mail in /var/spool/mail/root
2.切换集群节点:
# clusvcadm -r webservice -m node3.samlee.com
3.查询所有节点信息
# cman_tool nodes -a
上面为使用conga配置RHCS集群环境,下面我们使用命令行方式配置RHCS集群环境:
3.集群安装(命令行配置RHCS)
(1)使用ansible安装集群节点服务corosync、cman、rgmanager
# ansible rhcs -m yum -a "name=corosync state=present" # ansible rhcs -m yum -a "name=cman state=present" # ansible rhcs -m yum -a "name=rgmanager state=present" # ansible rhcs -m yum -a "name=ricci state=present" # ansible rhcs -m service -a "name=ricci state=started enabled=yes" # ansible rhcs -m service -a "name=corosync state=started enabled=yes"
(2)生成集群配置文件,找集群中某一个节点:
1)创建集群配置文件的框架
[[email protected] ~]# ccs_tool create mycluster
2)定义集群节点
[[email protected] ~]# ccs_tool addnode node1.samlee.com -n 1 -v 1 [[email protected] ~]# ccs_tool addnode node2.samlee.com -n 2 -v 1 [[email protected] ~]# ccs_tool addnode node3.samlee.com -n 3 -v 1
3)传递集群配置文件至各节点(、node2、node3),并启动所有节点cman加入自启动服务列表
# for i in {1..3};do scp -p /etc/cluster/cluster.conf node$i:/etc/cluster/;done # ansible rhcs -m shell -a "service cman start" # ansible rhcs -m service -a "name=cman state=started enabled=yes"
4)检测所有节点信息:
[[email protected] ~]# ccs_tool lsnode Cluster name: mycluster, config_version: 4 Nodename Votes Nodeid Fencetype node1.samlee.com 1 1 node2.samlee.com 1 2 node3.samlee.com 1 3 [[email protected] ~]# clustat Cluster Status for mycluster @ Wed Aug 24 17:03:26 2016 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.samlee.com 1 Online, Local node2.samlee.com 2 Online node3.samlee.com 3 Online
配置集群共享存储iSCSI
(1)在node4准备好共享存储的分区(/dev/sda5../dev/sda6--大小为20G逻辑分区)
# fdisk -l /dev/sda[5..6] Disk /dev/sda5: 21.5 GB, 21483376128 bytes 255 heads, 63 sectors/track, 2611 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sda6: 21.5 GB, 21484399104 bytes 255 heads, 63 sectors/track, 2611 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
(2)在node4上安装target服务
# yum -y install scsi-target-utils
(3)在node4上配置/etc/tgt/targets.conf如下所示:
# vim /etc/tgt/targets.conf <target iqn.2016-08.com.samlee:iscsi.disk> backing-store /dev/sda5 backing-store /dev/sda6 initiator-address 172.16.0.0/16 </target>
(4)在node4启动target服务并加入服务列表
# chkconfig tgtd on # service tgtd start # ss -tunlp | grep tgt tcp LISTEN 0 128 :::3260 :::* users:(("tgtd",1469,5),("tgtd",1472,5)) tcp LISTEN 0 128 *:3260 *:* users:(("tgtd",1469,4),("tgtd",1472,4))
(5)各集群节点安装iscsi客户端工具
# ansible rhcs -m yum -a "name=iscsi-initiator-utils state=present"
(6)配置各集群节点initiator文件
initiator的配置文件位于/etc/iscsi/,该目录下有两个文
件,initiatorname.iscsi 和iscsid.conf,其中iscsid.conf
是其配置文件,initiatorname.iscsi 是标记了initiator的名称,我们做如下配置:
# ansible rhcs -m shell -a ‘echo "InitiatorName=`iscsi-iname -p iqn.2016-08.com.samlee:iscsi.disk`" > /etc/iscsi/initiatorname.iscsi‘ # ansible rhcs -m shell -a ‘echo "InitiatorAlias=initiator" >> /etc/iscsi/initiatorname.iscsi‘
(7)配置各集群节点iSCSI客户端服务启动并加入自启动服务列表
# ansible rhcs -m service -a "name=iscsi state=started enabled=yes" # ansible rhcs -m service -a "name=iscsid state=started enabled=yes"
(8)配置所有集群节点侦测target
# ansible rhcs -m shell -a "iscsiadm -m discovery -t sendtargets -p 172.16.100.9"
(9)配置所有集群节点连接登陆target
# ansible rhcs -m shell -a "iscsiadm -m node -T iqn.2016-08.com.samlee:iscsi.disk -p 172.16.100.9 -l"
(10)测试所有集群节点磁盘状态
# ansible rhcs -m shell -a ‘fdisk -l /dev/sd[a-z]‘
配置集群文件系统gfs2
(1)各集群节点安装集群文件系统gfs2-utils
# ansible rhcs -m yum -a "name=gfs2-utils state=present"
(2)在集群中的某节点上执行如下命令,查看gfs2模块的装载情况:
# ansible rhcs -m shell -a ‘lsmod | grep gfs‘
(3)选择某一个节点进行分区/dev/sdb1大小为10G
# fdisk -l /dev/sdb1 Disk /dev/sdb1: 10.7 GB, 10738450432 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
(4)格式化集群文件系统
gfs2相关命令行工具的使用:
mkfs.gfs2为gfs2文件系统创建工具,其一般常用的选项有:
-b BlockSize:指定文件系统块大小,最小为512,默认为4096; -J MegaBytes:指定gfs2日志区域大小,默认为128MB,最小值为8MB; -j Number:指定创建gfs2文件系统时所创建的日志区域个数,一般需要为每个挂载的客户端指定一个日志区域; -p LockProtoName:所使用的锁协议名称,通常为lock_dlm或lock_nolock之一; -t LockTableName:锁表名称,一般来说一个集群文件系统需一个锁表名以便让集群节点在施加文件锁时得悉其所关联到的集群文件系统,锁表名称为clustername:fsname,其中的clustername必须跟集群配置文件中的集群名称保持一致,因此,也仅有此集群内的节点可访问此集群文件系统;此外,同一个集群内,每个文件系统的名称必须惟一;
因此,若要在前面的/dev/sdb1上创建集群文件系统gfs2,可以使用如下命令:
# mkfs.gfs2 -j 3 -t mycluster:webstore -p lock_dlm /dev/sdb1 Are you sure you want to proceed? [y/n] y Device: /dev/sdb1 Blocksize: 4096 Device Size 10.00 GB (2621692 blocks) Filesystem Size: 10.00 GB (2621689 blocks) Journals: 3 --日志区域数,只能允许3个节点挂载 Resource Groups: 41 Locking Protocol: "lock_dlm" Lock Table: "mycluster:webstore" UUID: fb0c4327-e2da-bad8-9a55-354b728a162d
(5)挂载使用:
1)在当前集群节点挂载使用,挂载时是不需要指定文件系统进行挂载:
[[email protected] ~]# mount | grep mnt /dev/sdb1 on /mnt type gfs2 (rw,relatime,hostdata=jid=0) [[email protected] ~]# touch /mnt/file{1..5}.xlsx [[email protected] ~]# ls /mnt/ file1.xlsx file2.xlsx file3.xlsx file4.xlsx file5.xlsx
2)在集群节点node2上进行挂载(测试显示读写数据之间可以实时传输同步):
--输出磁盘信息 [[email protected] ~]# partx -a /dev/sdb [[email protected] ~]# mount -t gfs2 /dev/sdb1 /mnt/ [[email protected] ~]# mount | grep mnt /dev/sdb1 on /mnt type gfs2 (rw,relatime,hostdata=jid=1) [[email protected] ~]# ls /mnt/ file1.xlsx file2.xlsx file3.xlsx file4.xlsx file5.xlsx
3)在集群节点node3上进行挂载(测试显示读写数据之间可以实时传输同步):
--输出磁盘信息 [[email protected] ~]# partx -a /dev/sdb [[email protected] ~]# mount -t gfs2 /dev/sdb1 /mnt/ [[email protected] ~]# mount | grep mnt /dev/sdb1 on /mnt type gfs2 (rw,relatime,hostdata=jid=1) [[email protected] ~]# ls /mnt/ file1.xlsx file2.xlsx file3.xlsx file4.xlsx file5.xlsx
异常处理:如果指定的日志区域不够用,可以使用gfs2_jadd添加新的日志区域
[[email protected] ~]# gfs2_jadd -j 1 /dev/sdb1 Filesystem: /mnt Old Journals 3 New Journals 4
4)临时冻结及解冻集群文件系统
# gfs2_tool freeze /mnt/ # gfs2_tool unfreeze /mnt/
5)查询gfs2文件系统挂载点可调整参数
# gfs2_tool gettune /mnt incore_log_blocks = 8192 log_flush_secs = 60 quota_warn_period = 10 quota_quantum = 60 max_readahead = 262144 complain_secs = 10 statfs_slow = 0 quota_simul_sync = 64 statfs_quantum = 30 quota_scale = 1.0000 (1, 1) new_files_jdata = 0
6)修改挂载点参数--调整日志刷新时长
[[email protected] mnt]# gfs2_tool settune /mnt log_flush_secs 120 [[email protected] mnt]# gfs2_tool gettune /mnt | grep log_flush_secs log_flush_secs = 120
7)查询文件系统日志区域个数
# gfs2_tool journals /mnt journal2 - 128MB journal3 - 128MB journal1 - 128MB journal0 - 128MB 4 journal(s) found.
8)配置开机自动挂载gfs文件系统
# vim /etc/fstab /dev/sdb1 /mnt gfs2 defaults 0 0 # service gfs2 start
基于集群逻辑卷创建gfs2文件系统
1)在集群节点中准备lvm类型的磁盘分区/dev/sdb2大小为10G
# fdisk -l /dev/sdb | grep "Linux LVM" /dev/sdb2 10242 20482 10486784 8e Linux LVM # partx -a /dev/sdb
注意: # partx -a /dev/sdb要在所有节点中都执行两次输出磁盘信息
# ansible rhcs -m shell -a ‘partx -a /dev/sdb‘
2)在各集群节点安装集群逻辑卷专用程序包
lvm2-cluster安装包信息:
# yum info lvm2-cluster Loaded plugins: fastestmirror, security Loading mirror speeds from cached hostfile c6-media | 4.0 kB 00:00 ... Available Packages Name : lvm2-cluster Arch : x86_64 Version : 2.02.100 Release : 8.el6 Size : 424 k Repo : c6-media Summary : Cluster extensions for userland logical volume management tools URL : http://sources.redhat.com/lvm2 License : GPLv2 Description : Extensions to LVM2 to support clusters.
安装lvm2-cluster:
# ansible rhcs -m yum -a "name=lvm2-cluster state=present"
3)调整各集群lvm锁类型为集群锁
# ansible rhcs -m shell -a ‘sed -i "[email protected]^\([[:space:]]*locking_type\).*@\1 = [email protected]" /etc/lvm/lvm.conf‘ 或: # ansible rhcs -m shell -a ‘lvmconf --enable-cluster‘
4)配置各集群节点启动clvmd服务及开机自启动
# ansible rhcs -m service -a "name=clvmd state=started enabled=yes"
5)配置集群逻辑卷(选择某一集群节点操作即可)
# pvcreate /dev/sdb2 # vgcreate clustervg /dev/sdb2 # lvcreate -L 5G -n clusterlv clustervg
6)格式化集群逻辑卷创建文件系统
# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:clvm /dev/clustervg/clusterlv
7)测试挂载使用:
node1挂载使用:
[[email protected] ~]# mount -t gfs2 /dev/clustervg/clusterlv /media/ [[email protected] ~]# cp /etc/issue /media/ [[email protected] ~]# ls /media/ issue
node2挂载使用:
[[email protected] ~]# mount -t gfs2 /dev/clustervg/clusterlv /media/ [[email protected] ~]# ls /media/ issue
node3挂载使用(会出现日志区域不够用的情况,添加新的日志区域;再进行挂载):
[[email protected] ~]# mount -t gfs2 /dev/clustervg/clusterlv /media/ Too many nodes mounting filesystem, no free journals [[email protected] ~]# gfs2_jadd -j 1 /dev/clustervg/clusterlv Filesystem: /media Old Journals 2 New Journals 3 [[email protected] ~]# mount -t gfs2 /dev/clustervg/clusterlv /media/ [[email protected] ~]# mount -t gfs2 /dev/clustervg/clusterlv /media/ [[email protected] ~]# ls /media/ issue
8)扩展逻辑卷
# lvextend -L +2G /dev/clustervg/clusterlv # df -lh | grep media /dev/mapper/clustervg-clusterlv 5.0G 388M 4.7G 8% /media --测试 # gfs2_grow -T /dev/clustervg/clusterlv # gfs2_grow /dev/clustervg/clusterlv FS: Mount Point: /media FS: Device: /dev/dm-4 FS: Size: 1310718 (0x13fffe) FS: RG size: 65533 (0xfffd) DEV: Size: 1835008 (0x1c0000) The file system grew by 2048MB. gfs2_grow complete. # df -lh | grep media /dev/mapper/clustervg-clusterlv 7.0G 388M 6.7G 6% /media