问题
# cmcheckconf -v -C /etc/cmcluster/cmclconfig.ascii
Begin cluster verification...
Checking cluster file: /etc/cmcluster/cmclconfig.ascii
Defaulting MAX_CONFIGURED_PACKAGES to 300.
Checking nodes ... Done
Checking existing configuration ... Done
Defaulting MAX_CONFIGURED_PACKAGES to 300.
Gathering storage information
Unable to receive device query message from mucs3173: Software caused connection abort <--- 问题
Found 148 devices on node mucs3088
Found 148 devices on node mucs3090
Found 150 devices on node mucs3091
Found 0 devices on node mucs3173 <---------- 这里
Found 148 devices on node mucs3179
Analysis of 594 devices should take approximately 16 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 59 volume groups on node mucs3088
Found 59 volume groups on node mucs3090
Found 59 volume groups on node mucs3091
Found 0 volume groups on node mucs3173
Found 59 volume groups on node mucs3179
Analysis of 236 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
Gathering polling target information
cmcheckconf: Unable to reconcile configuration file /etc/cmcluster/cmclconfig.ascii
不管是在哪个节点运行 cmcheckconf ,结果都一样.
配置 :
HPUX 11.31.
Serviceguard:A.11.19和修补软件PHSS_40152(在受影响的节点上安装了PHSS_41162,但是没有帮助).
解决办法
mucs3173 syslog.log 中包含很多以下信息:
cmclconfd[29685]: Could not get vg (/dev/vg3139_TAQ_A) info: 3
使用 cmscancl 命令取得所有节点的 /etc/lvmtab 内容并且发现只有受影响的节点上有 vg3139_TAQ_A 和 vg3139_TAQ_old :
$ grep -e lvmtab -e vg3139_TAQ scancl.out
------ Output of strings /etc/lvmtab (mucs3090) ------
/dev/vg3139_TAQ
------ Output of strings /etc/lvmtab (mucs3088) ------
/dev/vg3139_TAQ
------ Output of strings /etc/lvmtab (mucs3173) ------
/dev/vg3139_TAQ_A <---??
/dev/vg3139_TAQ_old <---??
/dev/vg3139_TAQ
------ Output of strings /etc/lvmtab (mucs3179) ------
/dev/vg3139_TAQ
------ Output of strings /etc/lvmtab (mucs3091) ------
/dev/vg3139_TAQ
如果加上 -k 选项cmcheckconf 会顺利执行. 这个选项消除了检查LVM磁盘, 所以这就确定了问题和LVM问题有关系.
动作 :
- • 检查 /dev/vg3139_TAQ_A 是否存在于受影响的节点上.
- 如果卷组 vg3139_TAQ_A 不需要了, vgexport 它.
效果 :
vgexport 解决了问题.
HP 集群软件 - 不能接收节点的设备查询信息:软件引起的连接失败