前提你得安装有Hadoop 我的版本hadoop2.3-cdh5.1.0
1、下载maven包
2、配置M2_HOME环境变量,配置maven 的bin目录到path路径
3、export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
4、到官方下载spark-1.0.2.gz压缩包、解压
5、进入spark解压包目录
6、执行./make-distribution.sh --hadoop 2.3.0-cdh5.1.0 --with-yarn --tgz
7、漫长的等待
8、完成后会在当前目录下生成spark-1.0.2-bin-2.3.0-cdh5.1.0.tgz
9、复制到安装目录解压
10、配置conf下的配置文件
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
配置参数:对应即可
export JAVA_HOME=/home/hadoop/jdk
export HADOOP_HOME=/home/hadoop/hadoop-2.3.0-cdh5.1.0
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
export SPARK_YARN_APP_NAME=spark-on-yarn
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=3500m
export SPARK_DRIVER_MEMORY=3500m
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=3500m
export SPARK_WORKER_INSTANCES=1
11、配置slaves
slave01
slave02
slave03
slave04
slave05
12、分发
拷贝spark安装目录到各个slave节点
13、启动
sbin/start-all.sh
14、运行实例
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 /home/hadoop/spark/lib/spark-examples-1.0.2-hadoop2.3.0-cdh5.1.0.jar 100
15、发送实例竟然没成功
在yarn监控界面点击日志出现一堆这些错误
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s).
16、解决问题
将spark目录下lib包的spark核心包拿到本地,发现里面有一个yarn-defaul.xml文件,打开发现
<!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>0.0.0.0</value> </property>
可想而知,到本地找resorcemanager,如果运行节点不是在yarn节点的resourcemanager上运行,怎么可能找到呢
17、修改这个配置如下
<!-- Resource Manager Configs --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property>
18、打包重新分发spark到各个节点