1.角色分配
IP |
Role | Hostname |
192.168.18.37 |
Master/NameNode/JobTracker |
HDP1 |
192.168.18.35 |
Slave/DataNode/TaskTracker |
HDP2 |
192.168.18.36 |
Slave/DataNode/TaskTracker |
HDP3 |
2. 分别安装JDK
mkdir -p /usr/local/setup
#安装JDK
cd /usr/lib
tar -xvzf /usr/local/setup/jdk-7u75-linux-x64.tar.gz
#改名为jdk7,纯属个人偏好
mv jdk1.7.0_75 jdk7
#增加JAVA环境变量
vi /etc/profile
在profile文件末尾,增加如下行:
export JAVA_HOME=/usr/local/lib/jdk7
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin
#修改jdk7的文件的相关权限
chown -R root:root jdk7
chmod -R 755 jdk7
#source修改后profile文件
source /etc/profile
#测试JAVA安装
java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
3. 分别修改 /etc/sysconfig/network和/etc/hosts
/etc/hosts这个就是指定IP和主机名的对应关系,/etc/sysconfig/network这个是指定机器的主机名。
/etc/hosts修改:
127.0.0.1 localhost localhost4 localhost4.localdomain4
192.168.18.37 HDP1
192.168.18.35 HDP2
192.168.18.36 HDP3
/etc/sysconfig/network修改:
HOSTNAME=本机的机器名
4. 配置HDP1无密码SSH访问HDP2和HDP3
4.1 配置HDP1本地无密码SSH
#HDP1切到hdp用户配置key。
ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
#修改sshd_config
sudo vi /etc/ssh/sshd_config
#删除#号,使如下三行的配置生效
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
#配置权限并重启sshd服务
cd ~/.ssh
chmod 600 authorized_keys
cd ..
chmod -R 700 .ssh
sudo service sshd restart
4.2 配置HDP1到HDP2和HDP3的无密码SSH
#将HDP1的authorized_keys复制到HDP2和HDP3
scp .ssh/authorized_keys hdp2:~/.ssh/authorized_keys_hdp1
scp .ssh/authorized_keys hdp3:~/.ssh/authorized_keys_hdp1
#分别在HDP2和HDP3上将authorized_keys_hdp1加入到本地的authorized_keys中
cat ~/.ssh/authorized_keys_hdp1 >> ~/.ssh/authorized_keys
#测试ssh localhost
ssh hdp2
ssh hdp3
Last login: Thu Apr 2 15:22:03 2015 from hdp1
5. 配置三台机的Hadoop文件
首先在Master(HDP1)配置,配置完成后将配置文件复制到Slaves上覆盖。如果有相关的目录,也需要在Slaves创建之。也可以在配置完成后,将整个hadoop安装目录复制到Slaves,并做为安装目录。
在Hadoop安装目录新增如下文件夹:
mkdir dfs dfs/name dfs/data tmp
dfs:用于hdfs的目录
dfs/name:hdfs的NameNode目录
dfs/data:hdfs的DataNode目录
tmp:hdfs的临时文件的目录
/etc/profile
export HADOOP_PREFIX=/usr/local/hadoop
Hadoop安装目录的环境变量
etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_LOG_DIR=/var/log/hadoop
Hadoop deamon的独立环境变量
etc/hadoop/yarn-env.sh
export JAVA_HOME=${JAVA_HOME}
yarn的独立环境变量
etc/hadoop/slaves,添加Slave机器名
HDP2
HDP3
etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hdp1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>HDP1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>HDP1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>HDP1:19888</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>HDP1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>HDP1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>HDP1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>HDP1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>HDP1:8088</value>
</property>
</configuration>
#将配置好的配置文件复制到Slaves
我选择复制所有配置文件。先复制到对应的Home目录,然后再覆盖到Hadoop安装目录,防止权限改变。
sudo scp -r /usr/local/hadoop/etc/hadoop [email protected]:~/
sudo scp -r /usr/local/hadoop/etc/hadoop [email protected]:~/
#SSH到对应的Slave,然后覆盖etc/hadoop。
我使用先删除后覆盖的方式。
rm -rf /usr/local/hadoop/etc/hadoop/*
mv ~/hadoop/* /usr/local/hadoop/etc/hadoop/
6. 添加Hadoop环境变量
方便调用hadoop/bin和hadoop/sbin中的命令和脚本,不用每次都输入绝对路径。
vi /etc/profile
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
重新source之
source /etc/profile
7.启动验证
#格式化NameNode
hdfs namenode -format
#启动hdfs
start-hdfs.sh
启动后HDP1上会有NameNode和SecondaryNameNode进程:
[[email protected] root]$ jps
2991 NameNode
3172 SecondaryNameNode
8730 Jps
Slaves上会有DataNode进程:
[[email protected] root]$ jps
2131 DataNode
4651 Jps
#启动yarn
start-yarn.sh
启动后,HDP1上会增加ResourceManager进程,Slaves上会增加NodeManager进程。同样可以用JPS观察。
8. 运行自带的WordCount示例
#创建一个要分析的txt
vi /usr/local/hadoop/wc.txt
this is a wordcount app
is a wordcount app
a wordcount app
wordcount app
app
#在hdfs创建相关目录并上传wc.txt
hdfs dfs -mkdir -p /wc/input
hdfs dfs -put wc.txt /wc/input/
#运行之
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /wc/input/wc.txt /wc/output
#查看结果
hdfs dfs -ls /wc/output
hdfs dfs -cat /wc/output/part-r-00000