项目起源是因为有人希望有个模拟环境来模拟公平调度器和容量调度器,以便合理配置调度器,降低生产环境出问题的风险,详见https://issues.apache.org/jira/browse/YARN-1021。之后在hadoop2.3.0就增加了这个工具。
首先设定环境变量:
export HADOOP_HOME=/usr/hadoop-2.3.0
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop #此目录放置sls-runner.xml文件
sls-runner.xml文件内容如下:
<configuration> <!-- SLSRunner configuration --> <property> <name>yarn.sls.runner.pool.size</name> <value>100</value> </property> <!-- Nodes configuration --> <property> <name>yarn.sls.nm.memory.mb</name> <value>10240</value> </property> <property> <name>yarn.sls.nm.vcores</name> <value>10</value> </property> <property> <name>yarn.sls.nm.heartbeat.interval.ms</name> <value>1000</value> </property> <!-- Apps configuration --> <property> <name>yarn.sls.am.heartbeat.interval.ms</name> <value>1000</value> </property> <property> <name>yarn.sls.am.type.mapreduce</name> <value>org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator</value> </property> <!-- Containers configuration --> <property> <name>yarn.sls.container.memory.mb</name> <value>1024</value> </property> <property> <name>yarn.sls.container.vcores</name> <value>1</value> </property> <!-- metrics --> <property> <name>yarn.sls.metrics.switch</name> <value>ON</value> </property> <property> <name>yarn.sls.metrics.web.address.port</name> <value>10001</value> </property> <property> <name>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler</name> <value>org.apache.hadoop.yarn.sls.scheduler.FifoSchedulerMetrics</value> </property> <property> <name>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</name> <value>org.apache.hadoop.yarn.sls.scheduler.FairSchedulerMetrics</value> </property> <property> <name>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</name> <value>org.apache.hadoop.yarn.sls.scheduler.CapacitySchedulerMetrics</value> </property> </configuration>
- yarn.sls.runner.pool.size
The simulator uses a thread pool to simulate the NM and AM running , and this parameter specifies the number of threads in the pool.
- yarn.sls.nm.memory.mb
The total memory for each NMSimulator.
- yarn.sls.nm.vcores
The total vCores for each NMSimulator.
- yarn.sls.nm.heartbeat.interval.ms
The heartbeat interval for each NMSimulator.
- yarn.sls.am.heartbeat.interval.ms
The heartbeat interval for each AMSimulator.
- yarn.sls.am.type.mapreduce
The AMSimulator implementation for MapReduce-like applications. Users can specify implementations for other type of applications.
- yarn.sls.container.memory.mb
The memory required for each container simulator.
- yarn.sls.container.vcores
The vCores required for each container simulator.
- yarn.sls.runner.metrics.switch
The simulator introduces Metrics to measure the behaviors of critical components and operations. This field specifies whether we open (ON) or close (OFF) the Metrics running.
- yarn.sls.metrics.web.address.port
The port used by simulator to provide real-time tracking. The default value is 10001.
- org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
The implementation of scheduler metrics of Fifo Scheduler.
- org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
The implementation of scheduler metrics of Fair Scheduler.
- org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
The implementation of scheduler metrics of Capacity Scheduler.
然后使用apache rumen对jobhistory文件进行解析,生成json格式文件以便sls解析:
hadoop jar hadoop-rumen-2.3.0.jar org.apache.hadoop.tools.rumen.TraceBuilder -write-job-trace file:///home/user/job-trace.json file:///home/user/topology.output file:///home/user/logs/history/done
file:///home/user/logs/history/done 用户集群存放运行完成jobhistory的目录,一般在hdfs里,可以通过hadoop fs -get取到本地目录
file:///home/user/job-trace.json file:///home/user/topology.output 生成的sls要读取的文件
运行模拟器
slsrun.sh --input-rumen=/home/user/ --output-dir=/usr/sls/sample-result
--input-rumen 本例就是file:///home/user/job-trace.json file:///home/user/topology.output 对应的路径
如果运行报错:
java.lang.NullPointerException
at org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)
at
在https://issues.apache.org/jira/browse/YARN-1021中查看comment,解决方法如下:
bin/slsrun.sh --input-sls=sls-file/sls-jobs.json --output-dir=output_sls --nodes=sls-file/sls-nodes.json
Yarn Scheduler Load Simulator YARN调度负载模拟器