OpenStack HA集群3-Pacemake Corosync

节点间主机名必须能解析

[[email protected] ~]# cat /etc/hosts

192.168.17.149  controller1

192.168.17.141  controller2

192.168.17.166  controller3

192.168.17.111  demo.open-stack.cn

各节点间要互信,无密码能登录

[[email protected] ~]# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

20:79:d4:a4:9f:8b:75:cf:12:58:f4:47:a4:c1:29:f3 [email protected]

The key‘s randomart image is:

+--[ RSA 2048]----+

|      .o. ...oo  |

|     o ...o.o+   |

|    o +   .+o .  |

|     o o +  E.   |

|        S o      |

|       o o +     |

|      . . . o    |

|           .     |

|                 |

+-----------------+

[[email protected] ~]# ssh-copy-id controller2

[[email protected] ~]# ssh-copy-id controller3

配置YUM源

# vim /etc/yum.repos.d/ha-clustering.repo

[network_ha-clustering_Stable]

name=Stable High Availability/Clustering packages (CentOS-7)

type=rpm-md

baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/

gpgcheck=0

gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/repodata/repomd.xml.key

enabled=1

这个YUM源可能会冲突,先enabled=0,如果剩下一个crmsh包,再enabled=1打开后安装

Corosync下载地址,目前最新版本2.4.2

http://build.clusterlabs.org/corosync/releases/

http://build.clusterlabs.org/corosync/releases/corosync-2.4.2.tar.gz

[[email protected] ~]# ansible controller -m copy -a "src=/etc/yum.repos.d/ha-cluster.repo dest=/etc/yum.repos.d/"

安装软件包

# yum install pacemaker pcs resource-agents -y cifs-utils quota psmisc corosync fence-agents-all lvm2 resource-agents

#  yum install crmsh  -y

启动pcsd,并确认启动正常

# systemctl enable pcsd

# systemctl enable corosync

# systemctl start pcsd

# systemctl status pcsd

[[email protected] ~]# pacemakerd -$

Pacemaker 1.1.15-11.el7_3.2

Written by Andrew Beekhof

[[email protected] ~]# ansible controller -m command -a "pacemakerd -$"

修改hacluster密码

【all】# echo zoomtech | passwd --stdin hacluster

[[email protected] ~]# ansible controller -m command -a "echo zoomtech | passwd --stdin hacluster"

# passwd hacluster

编辑corosync.conf

[[email protected] ~]# vim /etc/corosync/corosync.conf

totem {

version: 2

secauth: off

cluster_name: openstack-cluster

transport: udpu

}

nodelist {

node {

ring0_addr: controller1

nodeid: 1

}

node {

ring0_addr: controller2

nodeid: 2

}

node {

ring0_addr: controller3

nodeid: 3

}

}

logging {

to_logfile: yes

logfile: /var/log/cluster/corosync.log

to_syslog: yes

}

quorum {

provider: corosync_votequorum

}

[[email protected] ~]# scp /etc/corosync/corosync.conf controller2:/etc/corosync/

[[email protected] ~]# scp /etc/corosync/corosync.conf controller3:/etc/corosync/

[[email protected] corosync]# ansible controller -m copy -a "src=corosync.conf dest=/etc/corosync"

创建集群

使用pcs设置集群身份认证

[[email protected] ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p zoomtech --force

controller3: Authorized

controller2: Authorized

controller1: Authorized

现在我们创建一个集群并添加一些节点。注意,这个名字不能超过15个字符

[[email protected] ~]# pcs cluster setup --force --name openstack-cluster controller1 controller2 controller3

Destroying cluster on nodes: controller1, controller2, controller3...

controller3: Stopping Cluster (pacemaker)...

controller2: Stopping Cluster (pacemaker)...

controller1: Stopping Cluster (pacemaker)...

controller2: Successfully destroyed cluster

controller1: Successfully destroyed cluster

controller3: Successfully destroyed cluster

Sending cluster config files to the nodes...

controller1: Succeeded

controller2: Succeeded

controller3: Succeeded

Synchronizing pcsd certificates on nodes controller1, controller2, controller3...

controller3: Success

controller2: Success

controller1: Success

Restarting pcsd on the nodes in order to reload the certificates...

controller3: Success

controller2: Success

controller1: Success

启动集群

[[email protected] ~]# pcs cluster enable --all

controller1: Cluster Enabled

controller2: Cluster Enabled

controller3: Cluster Enabled

[[email protected] ~]# pcs cluster start --all

controller2: Starting Cluster...

controller1: Starting Cluster...

controller3: Starting Cluster...

查看集群状态

[[email protected] corosync]# ansible controller -m command -a "pcs cluster status"

[[email protected] ~]# pcs cluster status

Cluster Status:

Stack: corosync

Current DC: controller3 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

Last updated: Fri Feb 17 10:39:38 2017        Last change: Fri Feb 17 10:39:29 2017 by hacluster via crmd on controller3

3 nodes and 0 resources configured

PCSD Status:

controller2: Online

controller3: Online

controller1: Online

[[email protected] corosync]# ansible controller -m command -a "pcs status"

[[email protected] ~]# pcs status

Cluster name: openstack-cluster

Stack: corosync

Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

Last updated: Thu Mar  2 17:07:34 2017        Last change: Thu Mar  2 01:44:44 2017 by root via cibadmin on controller1

3 nodes and 1 resource configured

Online: [ controller1 controller2 controller3 ]

Full list of resources:

vip    (ocf::heartbeat:IPaddr2):    Started controller2

Daemon Status:

corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled

查看集群状态

[[email protected] corosync]# ansible controller -m command -a "crm_mon -1"

[[email protected] ~]# crm_mon -1

Stack: corosync

Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

Last updated: Wed Mar  1 17:54:04 2017          Last change: Wed Mar  1 17:44:38 2017 by root via cibadmin on controller1

3 nodes and 1 resource configured

Online: [ controller1 controller2 controller3 ]

Active resources:

vip     (ocf::heartbeat:IPaddr2):    Started controller1

查看pacemaker进程状态

[[email protected] ~]# ps aux | grep pacemaker

root      75900  0.2  0.5 132632  9216 ?        Ss   10:39   0:00 /usr/sbin/pacemaked -f

haclust+  75901  0.3  0.8 135268 15376 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/cib

root      75902  0.1  0.4 135608  7920 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/stonithd

root      75903  0.0  0.2 105092  5020 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/lrmd

haclust+  75904  0.0  0.4 126924  7636 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/attrd

haclust+  75905  0.0  0.2 117040  4560 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/pengine

haclust+  75906  0.1  0.5 145328  8988 ?        Ss   10:39   0:00 /usr/libexec/pacemaker/crmd

root      75997  0.0  0.0 112648   948 pts/0    R+   10:40   0:00 grep --color=auto pacemaker

查看集群状态

[[email protected] ~]# corosync-cfgtool -s

Printing ring status.

Local node ID 1

RING ID 0

id    = 192.168.17.132

status    = ring 0 active with no faults

[[email protected] corosync]# corosync-cfgtool -s

Printing ring status.

Local node ID 2

RING ID 0

id    = 192.168.17.146

status    = ring 0 active with no faults

[[email protected] ~]# corosync-cfgtool -s

Printing ring status.

Local node ID 3

RING ID 0

id    = 192.168.17.138

status    = ring 0 active with no faults

[[email protected] ~]# corosync-cmapctl | grep members

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.17.132)

runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.1.status (str) = joined

runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.17.146)

runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.2.status (str) = joined

runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.17.138)

runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.3.status (str) = joined

查看集群状态

[[email protected] ~]# pcs status corosync

Membership information

----------------------

Nodeid      Votes Name

1          1 controller1 (local)

3          1 controller3

2          1 controller2

[[email protected] corosync]# pcs status corosync

Membership information

----------------------

Nodeid      Votes Name

1          1 controller1

3          1 controller3

2          1 controller2 (local)

[[email protected] ~]# pcs status corosync

Membership information

----------------------

Nodeid      Votes Name

1          1 controller1

3          1 controller3 (local)

2          1 controller2

[[email protected] ~]# crm_verify -L -V

error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined

error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option

error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

[[email protected] ~]#

[[email protected] ~]# pcs property set stonith-enabled=false

[[email protected] ~]# pcs property set no-quorum-policy=ignore

[[email protected] ~]# crm_verify -L -V

[[email protected] corosync]# ansible controller -m command -a "pcs property set stonith-enabled=false

[[email protected] corosync]# ansible controller -m command -a "pcs property set no-quorum-policy=ignore"

[[email protected] corosync]# ansible controller -m command -a "crm_verify -L -V"

配置 VIP

[[email protected] ~]# crm

crm(live)# configure

crm(live)configure# show

node 1: controller1

node 2: controller2

node 3: controller3

property cib-bootstrap-options: \

have-watchdog=false \

dc-version=1.1.15-11.el7_3.2-e174ec8 \

cluster-infrastructure=corosync \

cluster-name=openstack-cluster \

stonith-enabled=false \

no-quorum-policy=ignore

crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=192.168.17.111 cidr_netmask=24 nic=ens37 op start interval=0s timeout=20s op stop interval=0s timeout=20s monitor interval=30s meta priority=100

crm(live)configure# show

node 1: controller1

node 2: controller2

node 3: controller3

primitive vip IPaddr2 \

params ip=192.168.17.111 cidr_netmask=24 nic=ens37 \

op start interval=0s timeout=20s \

op stop interval=30s timeout=20s monitor \

meta priority=100

property cib-bootstrap-options: \

have-watchdog=false \

dc-version=1.1.15-11.el7_3.2-e174ec8 \

cluster-infrastructure=corosync \

cluster-name=openstack-cluster \

stonith-enabled=false \

no-quorum-policy=ignore

crm(live)configure# commit

crm(live)configure# exit

查看VIP已绑定在ens37网卡上

[[email protected] ~]# ip a

4: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 00:0c:29:ff:8b:4b brd ff:ff:ff:ff:ff:ff

inet 192.168.17.141/24 brd 192.168.17.255 scope global dynamic ens37

valid_lft 2388741sec preferred_lft 2388741sec

  inet 192.168.17.111/24 brd 192.168.17.255 scope global secondary ens37

valid_lft forever preferred_lft forever

上面指定的网卡名称3个节点必须是同一个名称,否则飘移会出现问题,切换不过去

[[email protected] ~]# crm status

Stack: corosync

Current DC: controller1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum

Last updated: Wed Feb 22 11:42:07 2017        Last change: Wed Feb 22 11:22:56 2017 by root via cibadmin on controller1

3 nodes and 1 resource configured

Online: [ controller1 controller2 controller3 ]

Full list of resources:

vip    (ocf::heartbeat:IPaddr2):    Started controller1

查看corosync引擎是否正常启动

[[email protected] ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

[51405] controller1 corosyncnotice  [MAIN  ] Corosync Cluster Engine (‘2.4.0‘): started and ready to provide service.

Mar 01 17:35:20 [51425] controller1        cib:     info: retrieveCib:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)

Mar 01 17:35:20 [51425] controller1        cib:  warning: cib_file_read_and_verify:    Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)

Mar 01 17:35:20 [51425] controller1        cib:  warning: cib_file_read_and_verify:    Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)

Mar 01 17:35:20 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.Apziws (digest: /var/lib/pacemaker/cib/cib.0ZxsVW)

Mar 01 17:35:21 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.ObYehI (digest: /var/lib/pacemaker/cib/cib.O8Rntg)

Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.eqrhsF (digest: /var/lib/pacemaker/cib/cib.6BCfNj)

Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.riot2E (digest: /var/lib/pacemaker/cib/cib.SAqtzj)

Mar 01 17:35:42 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.Q8H9BL (digest: /var/lib/pacemaker/cib/cib.MBljlq)

Mar 01 17:38:29 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.OTIiU4 (digest: /var/lib/pacemaker/cib/cib.JnHr1v)

Mar 01 17:38:36 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.2cK9Yk (digest: /var/lib/pacemaker/cib/cib.JSqEH8)

Mar 01 17:44:38 [51425] controller1        cib:     info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.aPFtr3 (digest: /var/lib/pacemaker/cib/cib.E3Ve7X)

[[email protected]ler1 ~]#

查看初始化成员节点通知是否正常发出

[[email protected] ~]# grep  TOTEM /var/log/cluster/corosync.log

[51405] controller1 corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Unicast).

[51405] controller1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none

[51405] controller1 corosyncnotice  [TOTEM ] The network interface [192.168.17.149] is now up.

[51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.149}

[51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.141}

[51405] controller1 corosyncnotice  [TOTEM ] adding new UDPU member {192.168.17.166}

[51405] controller1 corosyncnotice  [TOTEM ] A new membership (192.168.17.149:4) was formed. Members joined: 1

[51405] controller1 corosyncnotice  [TOTEM ] A new membership (192.168.17.141:12) was formed. Members joined: 2 3

检查启动过程中是否有错误产生

[[email protected] ~]# grep ERROR: /var/log/cluster/corosync.log

时间: 2024-08-25 02:13:32

OpenStack HA集群3-Pacemake Corosync的相关文章

OpenStack HA集群2-RabbitMQ集群

1.安装RabbitMQ-server [[email protected] ~]# yum install -y erlang rabbitmq-server [[email protected] ~]# systemctl start rabbitmq-server.service [[email protected] ~]# systemctl enable rabbitmq-server.service [[email protected] ~]# systemctl status ra

Openstack HA集群5-Keystone HA

# yum install -y openstack-keystone httpd mod_wsgi # mysql -u root -p -e "CREATE DATABASE keystone " MariaDB [(none)]> CREATE DATABASE keystone; Query OK, 1 row affected (0.03 sec) MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'k

OpenStack HA集群4-Haproxy

1.安装haproxy # yum install -y haproxy # systemctl enable haproxy.service 2.配置haproxy日志 [[email protected] ~]# cd /etc/rsyslog.d/ [[email protected] rsyslog.d]# vim haproxy.conf $ModLoad imudp $UDPServerRun 514 $template Haproxy,"%msg%" local0.=in

HA集群之CoroSync+Pacemaker浅析及实现

一.CoroSync corosync最初只是用来演示OpenAIS集群框架接口规范的一个应用,可以说corosync是OpenAIS的一部分,然而后面的发展超越了官方最初的设想,越来越多的厂商尝试使用corosync作为集群解决方案.如Redhat的RHCS集群套件就是基于corosync实现. corosync只提供了message layer(即实现HeartBeat + CCM),而没有直接提供CRM,一般使用Pacemaker进行资源管理. OpenAIS是基于SA Forum 标准的

corosync+pacemaker实现高可用(HA)集群

corosync+pacemaker实现高可用(HA)集群(一) ????重要概念 在准备部署HA集群前,需要对其涉及的大量的概念有一个初步的了解,这样在实际部署配置时,才不至于不知所云 资源.服务与主机(又称节点)的关系: 资源包括vip,httpd,filesystem等: 可整合多个资源形成一个服务: 服务必运行在某个主机上,主机上也可不运行服务(此为空闲主机): 服务里的所有资源应该同时运行在同一个节点上,实现方式有2种: 资源组: 排列约束 资源类型 primitive(或native

HA集群之DRBD实现MySQL高可用

一.前言 本篇博文只是实现Corosync + Pacemaker + DRBD + MySQL,实现MySQL的高可用.更多的基础知识在前几篇博文中已有涉猎,故更多的理论细节将不再此篇中详述. 若想了解高可用基础知识,请参考:http://hoolee.blog.51cto.com/7934938/1406951 若想了解Corosync + Pacemaker,请参考:http://hoolee.blog.51cto.com/7934938/1409395 若想了解DRBD,请参考:http

HA集群之DRBD浅析及实现DRBD高可用

一.DRBD概述 DRBD (Distributed Replicated Block Device) 是 Linux 平台上的分散式储存系统. 由内核模组.用户空间工具组成,通常用于高可用性(high availability, HA)集群.其实现方式是通过网络在服务器之间的对块设备(硬盘,分区,逻辑卷等)进行镜像.你可以把它看作是一种网络RAID1. DRBD的工作模式如下图: DRBD工作的位置在文件系统的buffer cache和磁盘调度器之间,数据进入Buffer Cache后,DRB

OpenStack RabbitMQ 集群

      OpenStack RabbitMQ集群 管理手册 目  录 第1章 引言... 1 1.1 目的... 1 1.2 说明... 1 1.3 MQ.. 1 1.4 概念... 1 1.5 MQ 特点... 2 1.6 工作流程... 2 1.7 系统环境... 3 第2章 RabbitMQ 部署... 4 2.1 系统环境基本配置... 4 2.2RabbitMA 配置... 4 2.3RabbitMQ 集群配置... 6 第3章 RabbitMQ集群验证... 9 3.1Nova

高可用(HA)集群原理概述

一.高可用集群(High Availability Cluster) 集群(cluster)就是一组计算机,它们作为一个整体向用户提供一组网络资源.每一个单个的计算机系统都叫集群节点(node).随着业务的增长,集群通过添加新的节点,满足资源的高可扩展性. 计算机硬件和软件易错性不可避免,这样在节点上的服务会不可避免的中断.高可用集群的出现是为保证即使节点失效,而服务能不中断. 高可用集群在一组计算机中,采用主备模式,主节点提供服务,备节点等待:一旦,主节点失效,备节点无需人工的无缝取代主节点提