Greenplum启动失败Error occurred: non-zero rc: 1的修复

某日开发反馈测试环境的集群启动失败

报错内容如下:

[[email protected]:/root]$ gpstart
20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting gpstart with args:
20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Gathering information and validating the environment...
20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Binary Version: ‘postgres (Greenplum Database) 5.0.0 build dev‘
20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Catalog Version: ‘301705051‘
20181205:16:42:24:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance in admin mode
20181205:16:52:24:005451 gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20181205:16:52:24:005451 gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: ‘env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 2346 --gp_dbid=1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -c gp_role=utility " start‘
rc=1, stdout=‘waiting for server to start........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... stopped waiting
‘, stderr=‘could not change directory to "/root"
pg_ctl: could not start server
Examine the log output.

查看启动日志发现:

vim /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log
2018-12-05 08:42:24.067241 GMT,,,p5464,th-829482944,,,,0,,,seg-1,,,,,"WARNING","01000","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",4666,
2018-12-05 08:42:24.067612 GMT,,,p5464,th-829482944,,,,0,,,seg-1,,,,,"WARNING","01000","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",4666,
2018-12-05 08:42:24.083813 GMT,,,p5465,th-829482944,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",2046,
2018-12-05 08:42:24.098673 GMT,,,p5465,th-829482944,,,,0,,,seg-1,,,,,"FATAL","XX000","could not create shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=2346001, size=177586016, 03600).","This error usually means that PostgreSQL‘s request for a shared memory segment exceeded your kernel‘s SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 177586016 bytes), reduce PostgreSQL‘s shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 253).
If the request size is already small, it‘s possible that it is less than your kernel‘s SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1

内容大概是说/etc/sysctl.conf设置的内核参数shmmax过小,导致启动失败

查看/etc/sysctl.conf下的配置发现:

kernel.shmmax = 20000000
kernel.shmmni = 4096
kernel.shmall = 40000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2

对比官网建议的设置和参数定义以及集群已有的数据量,发现确实过小。于是改成官网建议的设置后启动。

20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Successful segment starts                                            = 8
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance hadoop-test2 directory /home/gpadmin/gpdata/gpmaster/gpseg-1
20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Command pg_ctl reports Master hadoop-test2 instance active
20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-No standby master configured.  skipping...
20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Database successfully started

启动成功。

总结:pg启动相关的内核参数配置与实际情况不匹配时,会导致启动失败。可通过查看日志详细信息查找根源解决问题。

参考文档:

1、官网建议设置 http://gpdb.docs.pivotal.io/4380/prep_os-system-params.html#topic3

2、内核参数含义http://www.oicqzone.com/pc/2012091612901.html

原文地址:https://www.cnblogs.com/chou1214/p/10072385.html

时间: 2024-11-06 11:49:44

Greenplum启动失败Error occurred: non-zero rc: 1的修复的相关文章

安装源码包inotify启动失败 error while loading shared libraries: libinotifytools.so.0: cannot open shared object fil

今天安装源码包inotify 一切安装妥当之后启动失败,报如下错误: 第一感觉是找不到这个库,此时有两种可能 1,操作系统上没有安装该库 2,操作系统找不到该库 首先在/usr路径下查找该包的名称,如下图 ps:为什么要在/usr下找呢,因为库文件一般都安装在该目录下. 结果只在源码包的安装路径下找到了该库,证明以源码包形式安装该软件的时候该库已经被安装在操作系统中了. 这就验证了上边的第2项,操作系统找不到该库的路径. 此时我们应该想怎么让操作系统找到该库 在这之前呢我们要先知道一点,程序里面

supervisord 启动失败 Error: Another program is already listening on a port that one of our HTTP serve...

Linux系统中 Supervisor 配置守护进程: 启动Supervisor 服务语句: supervisord -c /etc/supervisor/supervisord.conf 这个过程可能会失败,错误如下: 解决办法: 执行下面语句 unlink /var/run/supervisor.sock unlink /tmp/supervisor.sock 这个错误的原因就是supervisor.sock 这个文件会被系统自动删除或者其它原因不存在了,删除软连接就可以了. supervi

jboss服务启动失败报:Error occurred during initialization of VM

今天下午突然间公司的GTV管理平台上不去了 访问确实,提示找不到页面 登录终端查看服务进程. ps -ef | grep jboss 发现没有这个进程.怎么办,启动被. 输入nohup /home/cdms/jboss-4.0.5.GA/bin/run.sh & 启动后查看再次查看进程. ps -ef | grep jboss 然后,还是进程还是没有启动.奇了怪了,查看日志 tail -n 200 -f nohup.out 发现报如下错误 ============================

Windows 更新导致 VirtualBox 启动失败 VBoxDrvStub error

win8 导入virtualbox介质发生错误, NtCreateFile(\Device\VBoxDrvStub) failed: Unkown Status - 5600 (0xffffea20) (rcNt=0xe986ea20) VBoxDrvStub error: ntdll.dll: 7982 differences between 0x300c and 0x4fff in #1 (.text), first: 4c != 1f (rc=-5600) 尝试了各种办法,下载非最新版本v

Eclipse无法启动报An internal error occurred during: "reload maven project". java.lang.NullPointerException

由于没有正常关机导致eclipse无法将数据正常写入配置文件导致无法启动.报这样一个异常 An internal error occurred during: "reload maven project". java.lang.NullPointerException 查看workspace/.metadata/.log发现如下异常: !ENTRY org.eclipse.core.resources 2 10035 2015-12-15 09:31:13.488 !MESSAGE T

Discuz云平台站点信息同步失败,An unknown error occurred. May be DNS Error.

站点信息同步失败 An unknown error occurred. May be DNS Error. (ERRCODE:1) 经过Discuz教程网(http://www.1314study.com/)逐一排查代码,发现是:source\plugin\manyou\Service\Client\Restful.php 文件里的一处请求超时时间设置太短,所以解决办法就是将超时间改大点,具体修改如下:搜索找到: $result = $this->_fsockopen($url, 0, $dat

Eclipse启动报错:An internal error occurred during: "Building workspace". GC overhead limit exceeded

今天启动Eclipse时发现右下角的building workspce走到2%时,不动了,Eclipse好长时间没反应,然后弹出错误信息:An internal error occurred during: "Building workspace". GC overhead limit exceeded. 如下图: 上网搜了一下,说是要修改一下Eclipse安装下的eclipse.ini文件,增加Eclipse实例的内存分配,.打开eclipse.ini文件,内容如下: [html]

MyEclipse for Spring启动时报错"An internal error occurred during: 'Updating indexes'.Java heap space"的解决办法

问题 MyEclipse for Spring在启动时,报如下错误:An internal error occurred during: 'Updating indexes'.Java heap space 解决办法 对于这种问题,可以采用禁止MyEclipse的updating indexes的方法来解决. Window -> Preferences -> Myeclipse -> Maven4Myeclipse,禁用Download repository index updates

centos启动提示an error occurred during the file system check

由于我是在centos系统的vps中遇到的问题,并不清楚以下方法是否适用于其他linux系统,不过我想是适用的,只不过命令有差别. centos启动提示an error occurred during the file system check这个问题一般是没有正常关机,直接断电,或者磁盘挂载出错导致.先输入root密码,-----------------------------然后使用修复磁盘命令fsck,也可以加一些参数.-a 自动修复文件系统,不询问任何问题.-A 依照/etc/fstab