外地研发中心新增几台工作站,按照惯例对cadence、synopsys、mentor的license进行和谐后,启动license就可以使用了,有人反应synopsys在某台Server上有问题,根据他们的描述,于是远程连过去看下具体情况
1、首先确认环境变量是否正确
[[email protected] ~]$ which hspice
/app/synopsys/hspice/I-2013.12/hspice/bin/hspice
[[email protected] ~]$ which hspice64
/app/synopsys/hspice/I-2013.12/hspice/bin/hspice64
[[email protected] ~]$ hspice64
Using: /usr/bin/time -p /app/synopsys/hspice/I-2013.12/hspice/amd64/hspice
---------------------------------------------------------- ------
| |
| HSPICE Version I-2013.12 64-BIT |
| SN: P20131125-RHEL64 |
| Machine Name: s09 |
| Copyright (C) 2013 Synopsys, Inc. All Rights Reserved. |
| |
---------------------------------------------------------- ------
HSPICE Usage:
simulation mode:
hspice [input_file] { -i input_file | -n #num | -d | -x
-o [output_file] | -html [html_file] | -mt #num | -mp [#num]
-dp [#num] | -dpconfig [dp_configuration_file] | -dplocation [NFS|TMP] | -me
rge | -gz | -hdl filename | -hdlpath pathname | -vamodel name }
根据输出,可以断定环境变量没问题。
2、查看license进程
[[email protected] synopsys]# ps -ef |grep lmgrd
edadmin 17649 1 0 14:52 pts/2 00:00:00 /var/LIC/synopsys/lmgrd -c /var/LIC/synopsys/synopsys.dat -l /var/LIC/synopsys/logcarlos 23351 17526 0 20:20 pts/8 00:00:00 grep lmgrd
发现缺少一个snpslmd 进程.
3、重启license并生成log
[[email protected] ]$ /var/LIC/synopsys/lmgrd -c /var/LIC/synopsys/synopsys.dat -l /var/LIC/synopsys/log
在log中发现有异常TCP_NODELAY NOT enabled
查阅后是Hostid与Mac不对应会出现上述情况。
4、查看本机的Hostid与MAC信息
[[email protected] bin]# pwd
/app/synopsys/SCL/11.11.1/linux64/bin
[[email protected] bin]# ./lmhostid
lmhostid - Copyright (c) 1989-2015 Flexera Software LLC. All Rights Reserved.
The FlexNet host ID of this machine is ""××604b7ec981 ××604b7ec982""
Only use ONE from the list of hostids.
[[email protected] ]# ifconfig -a
eth0 Link encap:Ethernet HWaddr ××:60:4B:7E:C9:82
inet addr:10.10.10.19 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::1260:4bff:fe7e:c982/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13709993 errors:0 dropped:0 overruns:0 frame:0
TX packets:13582244 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6085098527 (5.6 GiB) TX bytes:6456766880 (6.0 GiB)
Interrupt:20 Memory:eff00000-eff20000
eth1 Link encap:Ethernet HWaddr ××:60:4B:7E:C9:81
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:17 Memory:efe00000-efe20000
[[email protected] ~]# vi /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x8086:0x1502 (e1000e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="××:60:4b:7e:c9:82", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x8086:0x10d3 (e1000e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="××:60:4b:7e:c9:81", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
看到license的信息
[[email protected] synopsys]# more synopsys.dat
SERVER s09 ××604B7EC982 27000
VENDOR snpslmd /var/LIC/synopsys/snpslmd
synopsys一般会去找eth0,默认会使用首个Hostid,猜测因为MAC地址不对应造成的。
5、便使用另个Mac重新和谐,并启动license
[[email protected] log]$ /var/LIC/synopsys/lmgrd -c /var/LIC/synopsys/synopsys.dat -l /var/LIC/synopsys/log
[[email protected] ~]# ps -ef |grep lmgrd
edadmin 25209 1 0 16:35 pts/8 00:00:00 /var/LIC/synopsys/lmgrd -c /var/LIC/synopsys/synopsys.dat -l /var/LIC/synopsys/logedadmin 30058 25209 2 22:07 ? 00:00:13 snpslmd -T s09 11.12 3 -c /var/LIC/synopsys/synopsys.dat -srv LbMe?3s???dE?cspk)/T?]2E2V]?p?yz{T.%mon?#[sGlOE^?C?{9S/?r?W?r#s --lmgrd_start 58f3210 -vdrestart 16carlos 30153 25560 0 22:15 pts/4 00:00:00 grep lmgrd
查看启动时的log,发现没有刚才的ERROR了
6、重新运行hspice后,暂时没有异常,但是运行一段时间后,又有新问题出现。
7、查看Hspice的log
****** HSPICE -- I-2013.12 64-BIT (Nov 25 2013) RHEL64 ******
Copyright (C) 2013 Synopsys, Inc. All Rights Reserved.
Unpublished-rights reserved under US copyright laws.
This program is protected by law and is subject to the
terms and conditions of the license agreement from Synopsys.
Use of this program is your acarloseptance to be bound by the
license agreement. HSPICE is the trademark of Synopsys, Inc.
Input File: test.sp
Command line options: test.sp -o test.lis -mt 4
lic:
lic: FLEXlm: v10.9.8
lic: USER: jcarloshen HOSTNAME: s09
lic: HOSTID: XX604b7ec982 PID: 30169
lic: Cannot read data from license server system. The license server system appears to be running,
**error** invalid memory reference
提示无法从许可证服务器系统读取数据。许可证服务器系统似乎正在运行。**错误*无效的内存引用。
8、查看license运行情况
[[email protected] ]# ps -ef |grep lmgrd
edadmin 30378 1 0 22:31 pts/8 00:00:00 /var/LIC/synopsys/lmgrd -c /var/LIC/synopsys/synopsys.dat -l /var/LIC/synopsys/log
发现一个进程异常停止。
9、查看另外server情况。
[[email protected] ~]# lsb_release -a
LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:
printing-4.0-ia32:printing-4.0-noarchDistributor ID: CentOS
Description: CentOS release 5.9 (Final)
Release: 5.9
Codename: Final
系统版本Centos5.9
[[email protected] ~]# cd /app/synopsys/SCL/11.11.1/linux64/bin/
[[email protected] bin]# ./lmhostid
lmhostid - Copyright (c) 1989-2015 Flexera Software LLC. All Rights Reserved.
The FlexNet host ID of this machine is ""××604b7e105b ××604b7e105c""
Only use ONE from the list of hostids.
[[email protected] bin]# ifconfig -a
eth0 Link encap:Ethernet HWaddr ××:60:4B:7E:10:5C
inet addr:10.10.10.16 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::1260:4bff:fe7e:carlos5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:66614905 errors:0 dropped:0 overruns:0 frame:0
TX packets:76415073 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:34874497627 (32.4 GiB) TX bytes:48450863867 (45.1 GiB)
Interrupt:138 Memory:eff00000-eff20000
eth1 Link encap:Ethernet HWaddr ××:60:4B:7E:10:5B
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:201 Memory:efe00000-efe20000
[[email protected] bin]# more /var/LIC/synopsys/synopsys.dat
SERVER s06 ××604B7E105C 27000
VENDOR snpslmd /var/LIC/synopsys/snpslmd
USE_SERVER
[[email protected] ~]# lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:p
rinting-4.0-noarchDistributor ID: CentOS
Description: CentOS release 6.8 (Final)
Release: 6.8
Codename: Final
系统版本为Centos6..8
两台工作站硬件一样,license都是使用的第二个Hostid和谐的,系统版本不一样,判定license跟使用哪个Hostid无关。
10、查看系统的日志信息
[[email protected] ~]# dmesg
......
ata4: EH complete
snpslmd[31655]: segfault at 6c153990 ip 0000003982134cfc sp 00007f1f71ae7d28 error 4 in libc-2.12.so[3982000000+18a000]
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata4.00: failed command: SMART
...
snpslmd‘ doesn‘t belong to any package and ProcessUnpackaged i s set to ‘no‘
...
ata4.01: SATA link down (SStatus 0 SControl 0)
ata4.00: configured for UDMA/100
ata4: EH complete
Saved core dump of pid 3498 (/var/LIC/synopsys/snpslmd) to /var/spool/abrt/ccpp-2017-04-15-00:00:38-3498 (262381568 bytes)
abrtd: Directory ‘ccpp-2017-04-15-00:00:38-3498‘ creation detected
......
Apr 17 00:11:02 s09 kernel: snpslmd[31655]: segfault at 6c153990 ip 0000003982134cfc sp 00007f1f71ae7d28 error 4 in libc-2.12.so[398200000
0+18a000]Apr 17 00:30:19 s09 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
......
#########################T H R E A D############################
问题结论:
查看message日志发现, 创建堆转存储时,访问到了非法内存地址,从而导致进程coredump,并生成了coredump文件,由于操作系统设置了不生成core文件,导致生成dump文件被删除。
在RHEL/CentOS 64位(32位没用过)6.0以上版本中,有core文件被截断的问题,即使你已经设置了ulimit -S -c unlimited。
原因好像是因为core pattern设置是abrt,abrt的问题导致core文件很小或者不产生core文件。解决的方法是不使用abrt作为core pattern。
查看机器的core pattern的设置:
sysctl -a | grep core_pattern
或者:
cat /proc/sys/kernel/core_pattern
如果core pattern设置成了abrt,改成core方式:
[[email protected] ~] sysctl -w kernel.core_pattern=core.%p.%e
kernel.core_pattern = core.%p.%e
或者:
[[email protected] ~] sysctl -w kernel.core_pattern=core.%p
kernel.core_pattern = core.%p
在Linux下要保证程序崩溃时生成Coredump要注意这些问题:
一、要保证存放Coredump的目录存在且进程对该目录有写权限。存放Coredump的目录即进程的当前目录,一般就是当初发出命令启动该进程时所在的目录。但如果是通过脚本启动,则脚本可能会修改当前目录,这时进程真正的当前目录就会与当初执行脚本所在目录不同。这时可以查看”/proc/<进程pid>/cwd“符号链接的目标来确定进程真正的当前目录地址。通过系统服务启动的进程也可通过这一方法查看。
二、若程序调用了seteuid()/setegid()改变了进程的有效用户或组,则在默认情况下系统不会为这些进程生成Coredump。很多服务程序都会调用seteuid(),如Mysql,不论你用什么用户运行mysqld_safe启动MySQL,mysqld进行的有效用户始终是msyql用户。如果你当初是以用户A运行了某个程序,但在ps里看到的这个程序的用户却是B的话,那么这些进程就是调用了seteuid了。为了能够让这些进程生成core dump,需要将/proc/sys/fs /suid_dumpable文件的内容改为1(一般默认是0)。
三、要设置足够大的Core文件大小限制了。程序崩溃时生成的Core文件大小即为程序运行时占用的内存大小。但程序崩溃时的行为不可按平常时的行为来估计,比如缓冲区溢出等错误可能导致堆栈被破坏,因此经常会出现某个变量的值被修改成乱七八糟的,然后程序用这个大小去申请内存就可能导致程序比平常时多占用很多内存。因此无论程序正常运行时占用的内存多么少,要保证生成Core文件还是将大小限制设为unlimited为好。
在shell里使用命令:ulimit -c unlimited,这样进行修改只是对本次会话有效,是临时的,如果想让修改永久生效,则需要修改配置文件,如.bash_profile、/etc/profile或/etc/security/limits.conf
参考文档:
https://bugzilla.redhat.com/show_bug.cgi?id=583407
http://www.cnblogs.com/sjpisaboy/articles/210228.html
http://www.newsmth.net/nForum/#!article/METech/207964