GreenPlum数据库故障恢复测试

本文介绍gpdb的master故障及恢复测试以及segment故障恢复测试。

环境介绍:
Gpdb版本:5.5.0 二进制版本
操作系统版本: centos linux 7.0
Master segment: 192.168.1.225/24 hostname: mfsmaster
Stadnby segemnt: 192.168.1.227/24 hostname: server227
Segment 节点1: 192.168.1.227/24 hostname: server227
Segment 节点2: 192.168.1.17/24 hostname: server17
Segment 节点3: 192.168.1.11/24 hostname: server11
每个segment节点上分别运行一个primary segment和一个mirror segment

一、查看原始状态

select * from gp_segment_configuration;

$ gpstate -f
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-Starting gpstate with args: -f
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 5.5.0 build commit:67afa18296aa238d53a2dfcc724da60ed2f944f0‘
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.3.23 (Greenplum Database 5.5.0 build commit:67afa18296aa238d53a2dfcc724da60ed2f944f0) on x86_64-pc-linux-gnu, compiled by GCC gcc (GCC) 6.2.0, 64-bit compiled on Feb 17 2018 15:23:55‘
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-Standby master details
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-----------------------
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-   Standby address          = server227
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-   Standby data directory   = /home/gpadmin/master/gpseg-1
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-   Standby port             = 5432
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-   Standby PID              = 22279
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:-   Standby status           = Standby host passive
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--------------------------------------------------------------
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--pg_stat_replication
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--------------------------------------------------------------
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--WAL Sender State: streaming
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--Sync state: sync
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--Sent Location: 0/CF2C470
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--Flush Location: 0/CF2C470
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--Replay Location: 0/CF2C470
20180320:13:50:38:021814 gpstate:mfsmaster:gpadmin-[INFO]:--------------------------------------------------------------

二、master主从切换
1、模拟当前主库宕机,这里直接采用killall gpadmin用户下的所有进程来模拟

2、在master standby节点(227服务器上)进行执行切换命令,提升227为master

$ gpactivatestandby -d master/gpseg-1/
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:------------------------------------------------------
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Standby data directory    = /home/gpadmin/master/gpseg-1
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Standby port              = 5432
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Standby running           = yes
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Force standby activation  = no
20180320:13:53:20:030558 gpactivatestandby:server227:gpadmin-[INFO]:------------------------------------------------------
Do you want to continue with standby master activation? Yy|Nn (default=N):
> y
20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[INFO]:-found standby postmaster process
20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Updating transaction files filespace flat files...
20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Updating temporary files filespace flat files...
20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Promoting standby...
20180320:13:53:26:030558 gpactivatestandby:server227:gpadmin-[DEBUG]:-Waiting for connection...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Standby master is promoted
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Reading current configuration...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[DEBUG]:-Connecting to dbname=‘postgres‘
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Writing the gp_dbid file - /home/gpadmin/master/gpseg-1/gp_dbid...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-But found an already existing file.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Hence removed that existing file.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Creating a new file...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Wrote dbid: 1 to the file.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Now marking it as read only...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Verifying the file...
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:------------------------------------------------------
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-The activation of the standby master has completed successfully.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-server227 is now the new primary master.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-You will need to update your user access mechanism to reflect
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-the change of master hostname.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-operational, this could result in database corruption!
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now /home/gpadmin/master/gpseg-1 if
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-to set this value.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-may need to make additional configuration changes to allow access
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-to the Greenplum instance.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-the master to its previous state once it becomes available.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-Query planner statistics must be updated on all databases
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-following standby master activation.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases.
20180320:13:53:27:030558 gpactivatestandby:server227:gpadmin-[INFO]:------------------------------------------------------

3、测试提升后的主库是否正常

$ psql -d postgres -c ‘ANALYZE‘
postgres=# select * from gp_segment_configuration;


4、这里可能需要同步配置一下pg_hba.conf文件,才能通过客户端进行远程连接

到这里我们已经完成了master节点的故障切换工作。

三、添加新的master standby
1、 在225服务器上执行gpstart -a命令启动gpdb数据库的时候报错”error: Standby active, this node no more can act as master”。当standby 提升为master的时候,原master服务器从故障中恢复过来,需要以standby的角色加入

2、在原master服务器225上的数据进行备份

$ cd master/
$ ls
gpseg-1
$ mv gpseg-1/ backup-gpseg-1

3、在当前master服务器227上进行 gpinitstandby添加225为standby

$ gpinitstandby -s mfsmaster
$ gpstate -f


四、primary segment和mirror segment切换
1、首先我们来捋一下当前的数据库环境
Master segment: 192.168.1.227/24 hostname: server227
Stadnby segemnt: 192.168.1.225/24 hostname: mfsmaster
Segment 节点1: 192.168.1.227/24 hostname: server227
Segment 节点2: 192.168.1.17/24 hostname: server17
Segment 节点3: 192.168.1.11/24 hostname: server11
每个segment节点上分别运行一个primary segment和一个mirror segment

2、接着我们采用同样的方式把227服务器上gpadmin用户的所有进行杀掉

$ killall -u gpadmin

3、在225服务器上执行切换master命令

$ gpactivatestandby -d master/gpseg-1/

4、完成切换后使用客户端工具连接查看segment状态,可以看到227服务器上的server227
的primary和mirror节点都已经宕机了。

5、这里为了方面查看,我们使用greenplum-cc-web工具来查看集群状态

$ gpcmdr --start hbjy


需要将pg_hba.conf文件还原回去,因为227上所有的segment已经宕掉,执行gpstop -u命令会有报错

在segment status页面中可以看到当前segment的状态是异常的。server11上有两组的primary segment,这很危险,如果不幸server11也宕机了,整个集群的状态就变成不可用了。

6、将server227做为master standby重新加入集群

$ cd master/
$ mv gpseg-1/ backupgpseg-1
$ gpinitstandby -s server227


7、在master上重启集群

$ gpstop -M immediate
$ gpstart -a

8、在master上恢复集群

$ gprecoverseg 


虽然所有的segment均已启动,但server11上有还是有两组的primary segment

9、在master上恢复segment节点分布到原始状态

$ gprecoverseg -r


参考文档:
http://greenplum.org/docs/520/admin_guide/highavail/topics/g-restoring-master-mirroring-after-a-recovery.html

原文地址:http://blog.51cto.com/ylw6006/2088991

时间: 2024-08-30 12:05:01

GreenPlum数据库故障恢复测试的相关文章

Python脚本访问Greenplum数据库安装指导

安装前准备 (1)操作系统(系统上面要安装一些必备的开发工具(比如gcc等)) linux-82:/home/PyODBC # cat/etc/SuSE-release SUSE Linux EnterpriseServer 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 (2)安装所需的软件包 greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.zip --GP官网下载,GP的JDBC和ODBC驱动 pyod

Perl脚本访问Greenplum数据库安装指导

安装前准备 (1)操作系统(系统上面要安装一些必备的开发工具(比如gcc等)) linux-82:/home/PlODBC # cat/etc/SuSE-release SUSE Linux EnterpriseServer 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 (2)安装所需的软件包 greenplum-connectivity-4.3.0.0-build-2-SuSE10-x86_64.zip --GP官网下载,GP的JDBC和ODBC驱动 DBI-

Greenplum 数据库架构分析

Greenplum 数据库是最先进的分布式开源数据库技术,主要用来处理大规模的数据分析任务,包括数据仓库.商务智能(OLAP)和数据挖掘等.自2015年10月正式开源以来,受到国内外业内人士的广泛关注.本文就社区关心的Greenplum数据库技术架构进行介绍. 一. Greenplum数据库简介 大数据是个炙手可热的词,各行各业都在谈.一谈到大数据,好多人认为就是Hadoop.实际上Hadoop只是大数据若干处理方案中的一个.现在的SQL.NoSQL.NewSQL.Hadoop等等,都能在不同层

MPP - GreenPlum数据库安装以及简单使用

一.集群介绍 共3台主机,ip 为193.168.0.93   193.168.0.94  193.168.0.95 集群对应master和segment如下,193.168.0.93为master节点.193.168.0.94  193.168.0.95为segment节点,每个segment节点配置两个primary segment和两个mirror segment(也可以为master做一个备份,目前没有做) 架构图入下 二.服务器修改(all host) 2.1配置hosts vi /e

GreenPlum 数据库创建用户、文件空间、表空间、数据库

前几篇文章介绍了GreenPlum数据库的安装.启动.关闭.状态检查.登录等操作,数据库已经创建好了,接下来介绍如何使用数据库.按照习惯,需要先创建测试用户.表空间.数据库.先创建测试用户dbdream. view source 1 postgres=# create role dbdream password 'dbdream' createdb login; 2 NOTICE:  resource queue required -- using default resource queue

数据库代码覆盖率测试功能测试建模压测profiling;

数据库管理系统(简称 DBMS)无疑是任何数据密集型应用程序当中最为重要的组成部分,其肩负着处理大量数据以及高复杂性工作负载的重任.然而,数据库管理系统本身却往往难于管理,因为其中通常包含数百种配置"旋钮",用于控制诸如缓存内存分配量以及存储介质数据写入频率等要素.各类企业一般需要聘请专业人士以协助相关调配工作,但对于大多数企业而言,此类专业人才的开价亦相当高昂.而实际上,DBA所面临的挑战还远不止这些. 而今天一则名为"OtterTune"的机器学习DBMS系统刷

数据库的测试

如果想保证业务层测试的正确性,那么我们必须要对数据库进行测试.但是目前我还没想到在内存中去进行数据库的测试,只能进行集成测试.那么下面来讲一下关于数据层的测试. 因为在数据库的内部我们无法控制我们只能通过黑盒测试,给予值然后返回我们想要的预期效果来判断是否成功.但是在测试中我们必须要保证单一性,比喻在测试Add的时候当我们添加一条数据那么数据库可能就会产生一条脏数据,如果每天运行一次那么后果也是可怕的,但是有人说产区添加和删除一起测,我觉得也不合理,因为这样一来你一个测试既有添加又有删除不可取.

greenplum数据库python自定义函数

greenplum数据库(下面简称gp数据库)支持自定义函数,下面介绍的是python编写的自定义简单函数.聚类函数较复杂,自我感觉不适合在gp数据库中编写. python自定义函数说明了只要python能对行级数据做的处理,gp都能做. 样例:python对json做处理返回多行. create or replace function public.json_parse(data text) returns setof text AS $$ import json try: mydata=js

开源大数据引擎:Greenplum 数据库架构分析

Greenplum 数据库是最先进的分布式开源数据库技术,主要用来处理大规模的数据分析任务,包含数据仓库.商务智能(OLAP)和数据挖掘等.自2015年10月正式开源以来.受到国内外业内人士的广泛关注.本文就社区关心的Greenplum数据库技术架构进行介绍. 一. Greenplum数据库简单介绍 大数据是个炙手可热的词.各行各业都在谈.一谈到大数据,好多人觉得就是Hadoop.实际上Hadoop仅仅是大数据若干处理方案中的一个.如今的SQL.NoSQL.NewSQL.Hadoop等等.都能在