一. 安装Java
Java下载
官网下载合适的jdk,本人使用的是jdk-7u79-linux-x64.tar.gz
,接下来就以该版本的jdk为例,进行Java环境变量配置
创建Java目录
在/usr/local目录下创建java目录,用于存放解压的jdk
cd /usr/local
mkdir java
解压jdk
进入java目录
cd java
tar zxvf jdk-7u79-linux-x64.tar.gz
配置环境变量
- 编辑profile文件
cd /etc vim profile
- 在文件末尾追加以下配置
export JAVA_HOME=/usr/local/java/jdk1.7.0_79 export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre export PATH=$PATH:/usr/local/java/jdk1.7.0_79/bin export CLASSPATH=./:/usr/local/java/jdk1.7.0_79/lib:/usr/local/jdk7/jdk1.7.0_79/jre/lib
- 刷新profile文件
source /etc/profile
二. 安装Hadoop
下载Hadoop
Hadoop Down Page
根据需求选择合适的版本进行下载,本人下载的是hadoop-2.7.3.tar.gz
,接下来就以该版本的hadoop为例,进行安装&配置
创建Hadoop目录
在用户目录下创建hadoop
目录,用于存放解压的hadoop
cd /home/username
mkdir hadoop
[注]
username为用户名,需要根据实际情况决定
配置Hadoop Env
- 编辑
hadoop-env.sh
文件
cd /home/username/hadoop/hadoop-2.7.3/etc/hadoop/
chmod +x hadoop-env.sh
vim hadoop-env.sh
- 修改其中的
export JAVA_HOME=${JAVA_HOME}
为export JAVA_HOME=/usr/local/java/jdk1.7.0_79
- 执行
hadoop-env.sh
./hadoop-env.sh
配置 core-site.xml
vim core-site.xml
指定HDFS的NameNode地址,以及指定Hadoop运行时产生文件的存储目录
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-0:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
</property>
</configuration>
配置 hdfs-site.xml
vim hdfs-site.xml
指定HDFS副本的数量
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-0:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
配置 mapred-site.xml
1.重命名
mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
2.配置如下
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-0:19888</value>
</property>
</configuration>
配置 yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-0:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-0:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-0:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-0:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-0:8088</value>
</property>
</configuration>
配置masters
vim masters
#内容填写
hadoop-0
配置masters
vim slaves
#内容填写
hadoop-1
hadoop-2
hadoop-3
hadoop-4
hadoop-5
将以上配置完的hadoop复制到每台机器上,然后进行接下来的操作
格式化namenode,只格式一次就行
hadoop namenode -format
cd /hadoop/hadoop-2.7.3/sbin
./start-all.sh
#根据提示输入密码,如果提示Java_Home,表示没修改hadoop-env.sh中的JAVA_HOME路径,修改完后需要再次执行hadoop-env.sh
#console
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-0]
[email protected]'s password:
hadoop-0: starting namenode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-namenode-hadoop-0.out
hadoop-4: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-datanode-hadoop-4.out
hadoop-1: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-datanode-hadoop-1.out
hadoop-2: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-datanode-hadoop-2.out
hadoop-5: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-datanode-hadoop-5.out
hadoop-3: starting datanode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-datanode-hadoop-3.out
Starting secondary namenodes [hadoop-0]
[email protected]'s password:
hadoop-0: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-2.7.3/logs/hadoop-intel-secondarynamenode-hadoop-0.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-resourcemanager-hadoop-0.out
hadoop-4: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-nodemanager-hadoop-4.out
hadoop-5: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-nodemanager-hadoop-5.out
hadoop-1: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-nodemanager-hadoop-1.out
hadoop-2: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-nodemanager-hadoop-2.out
hadoop-3: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.3/logs/yarn-intel-nodemanager-hadoop-3.out
Test
1.命令jps
jps
2.控制台输出一下内容表示安装配置成功
# Master上打印输出
7398 NameNode
6898 ResourceManager
7732 SecondaryNameNode
8336 Jps
#Slave上打印输出
5505 Jps
5121 NodeManager
4992 DataNode
测试
打开网页http://localhost:8088/cluster可以正常查看,hostaname
根据实际情况决定,比如我的是172.16.1.23
向hadoop集群系统提交第一个mapreduce任务(wordcount)
hdfs dfs -mkdir -p /data/input #在虚拟分布式文件系统上创建一个测试目录/data/input
hdfs dfs -put README.txt /data/input #将当前目录下的README.txt 文件复制到虚拟分布式文件系统中
hdfs dfs -ls /data/input #查看文件系统中是否存在我们所复制的文件
#console
Found 1 items
-rw-r--r-- 2 intel supergroup 44 2017-03-27 16:24 /data/input/readme.txt
向hadoop提交单词统计任务
hadoop jar hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/input /data/output/result
#console
17/03/27 16:30:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-0/172.16.1.23:8032
17/03/27 16:30:07 INFO input.FileInputFormat: Total input paths to process : 1
.
.
.
#查看result
hdfs dfs -cat /data/output/result/part-r-00000
#console
! 1
Hi! 1
I 1
a 1
am 1
file 1
for 1
hadoop 1
mapreduce 1
test 1
主要事项!!
需要将Hadoop进行用户授权
sudo chown -R group:username hadoop/
sudo chown -R 用户名@用户组 目录名
原文地址:https://www.cnblogs.com/chinatrump/p/11584799.html
时间: 2024-12-18 15:38:21