先参考:《hadoop-2.3.0-cdh5.1.0伪分布安装(基于centos)》
http://blog.csdn.net/jameshadoop/article/details/39055493
注:本例使用root用户搭建
一、环境
操作系统:CentOS 6.5
64
位操作系统
注:Hadoop2.0以上采用的是jdk环境是1.7,Linux自带的jdk卸载掉,重新安装
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
软件版本:hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz
下载地址:http://archive.cloudera.com/cdh5/cdh/5/
c1:192.168.58.11
c2:192.168.58.12
c3:192.168.58.13
二、安装JDK(略)见上面的参考文章
三、配置环境变量 (配置jdk和hadoop的环境变量)
四、系统配置
1关闭防火墙
chkconfig iptables off(永久性关闭)
配置主机名和hosts文件
2、SSH无密码验证配置
因为Hadoop运行过程需要远程管理Hadoop的守护进程,NameNode节点需要通过SSH(Secure Shell)链接各个DataNode节点,停止或启动他们的进程,所以SSH必须是没有密码的,所以我们要把NameNode节点和DataNode节点配制成无秘密通信,同理DataNode也需要配置无密码链接NameNode节点。
在每一台机器上配置:
vi /etc/ssh/sshd_config打开
RSAAuthentication yes # 启用 RSA 认证,PubkeyAuthentication yes # 启用公钥私钥配对认证方式
Master01:运行:ssh-keygen –t rsa –P ‘‘ 不输入密码直接enter
默认存放在 /root/.ssh目录下,
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[[email protected] .ssh]# ls
authorized_keys id_rsa id_rsa.pub known_hosts
远程copy:
scp
authorized_keys c2:
~/
.
ssh
/
scp
authorized_keys c3:
~/
.
ssh
/
五、配置几个文件(各个节点一样)
5.1. hadoop/etc/hadoop/hadoop-env.sh 添加:
添加:
# set to the root ofyour Java installation export JAVA_HOME=/usr/java/latest # Assuming your installation directory is/usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop
5.2. etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://c1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/cdh/hadoop/data/tmp</value> </property> </configuration>
5.3. etc/hadoop/hdfs-site.xml
<configuration> <property> <!--开启web hdfs--> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/cdh/hadoop/data/dfs/name</value> <description> namenode 存放name table(fsimage)本地目录(需要修改)</description> </property> <property> <name>dfs.namenode.edits.dir</name> <value>${dfs.namenode.name.dir}</value> <description>namenode粗放 transactionfile(edits)本地目录(需要修改)</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/cdh/hadoop/data/dfs/data</value> <description>datanode存放block本地目录(需要修改)</description> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> </configuration>
5.4 etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
5.5 etc/hadoop/yarn-env.sh
# some Java parameters export JAVA_HOME=/usr/local/java/jdk1.7.0_67
5.6 etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>c1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>c1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>c1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>c1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>c1:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
5.7. etc/hadoop/slaves
c2 c3
六:启动及验证安装是否成功
格式化:要先格式化HDFS:
[html] view plaincopy启动:
- bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
[[email protected] hadoop]# jps
3250 Jps
2491 ResourceManager
2343 SecondaryNameNode
2170 NameNode
datanode节点:
[[email protected] ~]# jps
4196 Jps
2061 DataNode
2153 NodeManager
[html] view
plaincopy
- 1. 打开浏览器
- NameNode - http://localhost:50070/
- 2. 创建文件夹
- 3. $bin/hdfs dfs -mkdir /user
- $ bin/hdfs dfs -mkdir /user/<username>
- 4. Copy 文件
- $ bin/hdfs dfs -put etc/hadoop input
- 5. 运行作业
- $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar grep input output ‘dfs[a-z.]+‘
- 6. 查看输出
- $ bin/hdfs dfs -get output output
- $ cat output/*