PBS Pro可以算是torque的商业版本.功能强大.尤其是在其开源之后,成为了最强大的免费任务调度软件.
但是PBS Pro开源版本的预编译版本是针对CEntos7的,而Rocks 集群管理软件只支持到CEntos6.8.因此使用PBS Pro开源版本,必须要从源代码安装.安装过程中有很多的困难.因此再次记录一下过程,供大家参考.
首先,rocks集群安装的时候最好使用6.1.1,不要使用6.2;不要安装sge; OS roll最好不要使用自带的,而是用标准的centos6.7或者6.8安装盘装.
集群安装好以后
要手动修改 /etc/hosts ,将外网fqdn对应的ip改为内网ip
比如把
42.58.6.9 headnode.test.com
改为
10.0.0.1 headnode.test.com
切记每次运行rocks sync host network后都要手动改一下.否则的连不上pbs server.
从网上下载这4个包
pbspro-14.1.0.tar.gz
autoconf-2.69-12.2.noarch.rpm
1.13.4-3.2.noarch.rpm
libedit-devel-2.11-4.20080712cvs.1.el6.x86_64.rpm
然后放到集群共享目录,本文以/share/data/install为例
强制升级3个包
rpm -Uhv /share/data/install/autoconf-2.69-12.2.noarch.rpm
rpm -Uhv /share/data/install/automake-1.13.4-3.2.noarch.rpm
rpm -Uhv /share/data/install/libedit-devel-2.11-4.20080712cvs.1.el6.x86_64.rpm
安装所需软件
yum --enablerepo=base install -y gcc make rpm-build libtool hwloc-devel libX11-devel libXt-devel libedit-devel libical-devel ncurses-devel perl postgresql-devel python-devel tcl-devel tk-devel swig expat-devel openssl-devel libXext libXft expat libedit postgresql-server python sendmail sudo tcl tk libicaly glibc
yum --enablerepo=epel install hwloc hwloc-devel
cd /share/data/install/
tar -xvf pbspro-14.1.0.tar.gz
cd pbspro-14.1.0
./autogen.sh
./configure --prefix=/opt/pbs
make
make install
安装完成,进行初始化,这里假设管理节点不进行计算任务.
/opt/pbs/libexec/pbs_postinstall
chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
echo "PBS_SERVER=kunanyi-admin.local" > /etc/pbs.conf
echo "PBS_START_SERVER=1" >> /etc/pbs.conf
echo "PBS_START_SCHED=1" >> /etc/pbs.conf
echo "PBS_START_COMM=1" >> /etc/pbs.conf
echo "PBS_START_MOM=0" >> /etc/pbs.conf
echo "PBS_EXEC=/opt/pbs" >> /etc/pbs.conf
echo "PBS_HOME=/var/spool/pbs" >> /etc/pbs.conf
echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf
echo "PBS_SCP=/usr/bin/scp" >> /etc/pbs.conf
/etc/init.d/pbs start
. /etc/profile.d/pbs.sh
至此管理节点安装完毕
在计算节点执行下面的命令.核心就是让计算节点在管理节点已经编译过目录里执行make install来安装PBS pro的完全版本.可以把这些命令放在extend-compute.xml里面
rpm -ivf /share/data/install/libedit-devel-2.11-4.20080712cvs.1.el6.x86_64.rpm
yum --enablerepo=base install -y gcc make rpm-build libtool hwloc-devel libX11-devel libXt-devel libedit-devel libical-devel ncurses-devel perl postgresql-devel python-devel tcl-devel tk-devel swig expat-devel openssl-devel libXext libXft expat libedit postgresql-server python sendmail sudo tcl tk libicaly
cd /share/data/install/pbspro-14.1.0/
make install
/opt/pbs/libexec/pbs_postinstall
chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
echo "PBS_SERVER=kunanyi-admin.local" > /etc/pbs.conf
echo "PBS_START_SERVER=0" >> /etc/pbs.conf
echo "PBS_START_SCHED=0" >> /etc/pbs.conf
echo "PBS_START_COMM=0" >> /etc/pbs.conf
echo "PBS_START_MOM=1" >> /etc/pbs.conf
echo "PBS_EXEC=/opt/pbs" >> /etc/pbs.conf
echo "PBS_HOME=/var/spool/pbs" >> /etc/pbs.conf
echo "PBS_CORE_LIMIT=unlimited" >> /etc/pbs.conf
echo "PBS_SCP=/usr/bin/scp" >> /etc/pbs.conf
. /etc/profile.d/pbs.sh
/etc/init.d/pbs start
当所有节点安装了pbs之后,在管理节点添加计算机节点.这里以hc-002为例子.
qmgr -c "create node hc-002"
之后可以使用下面命令检测计算节点.
pbsnodes -a
在之后就是配置pbs,例如
qmgr
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server default_chunk.ncpus=1
set server default_queue = workq
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = kunanyi-admin
set server flatuid = True
set server acl_users ="[email protected],+test"
set queue workq acl_users ="[email protected],+test"