配置Hadoop
前提时已经配置过JDK和SSH
(如何配置JDK:http://www.cnblogs.com/xxx0624/p/4164744.html)
(如何配置SSH:http://www.cnblogs.com/xxx0624/p/4165252.html)
1. 添加Hadoop用户
sudo addgroup hadoop sudo adduser --ingroup hadoop hadoopsudo usermod -aG admin hadoop
2. 下载Hadoop文件(例:Hadoop1.2.1,我放的时/home/xxx0624/hadoop)
sudo tar -zxzf hadoop-1.2.1.tar.gz sudo mv hadoop-1.2.1 /home/xxx0624/hadoop
保证所有操作都是在hadoop用户下完成
sudo chown -R hadoop:hadoop /home/xxx0624/hadoop
3. 设置hadoop和java环境变量
sudo gedit /home/xxx0624/hadoop/conf/hadoop-env.sh
在打开的文件中末尾添加:
export JAVA_HOME=/usr/lib/jvm //(根据你本身的java安装路径而定的) export HADOOP_HOME=/home/xxx0624/hadoop export PATH=$PATH:/home/xxx0624/hadoop/bin
使环境变量生效(每次运行Hadoop命令都必须保证变量生效!)
source /home/xxx0624/hadoop/conf/hadoop-env.sh
4. 伪分布式模式配置
core-site.xml: Hadoop Core的配置项,例如HDFS和MapReduce常用的I/O设置等。
hdfs-site.xml: Hadoop 守护进程的配置项,包括namenode,辅助namenode和datanode等。
mapred-site.xml: MapReduce 守护进程的配置项,包括jobtracker和tasktracker。
4.1 首先新建这几个文件夹
mkdir tmp mkdir hdfs mkdir hdfs/name mkdir hdfs/data /*都是在hadoop文件夹下*/
4.2 开始编辑文件
core-site.xml:
1 <configuration> 2 <property> 3 <name>fs.default.name</name> 4 <value>hdfs://localhost:9000</value> 5 </property> 6 <property> 7 <name>hadoop.tmp.dir</name> 8 <value>/home/xxx0624/hadoop/tmp</value> 9 </property>
hdfs-site.xml:
1 <configuration> 2 <property> 3 <name>dfs.replication</name> 4 <value>1</value> 5 </property> 6 <property> 7 <name>dfs.name.dir</name> 8 <value>/home/xxx0624/hadoop/hdfs/name</value> 9 </property> 10 <property> 11 <name>dfs.data.dir</name> 12 <value>/home/xxx0624/hadoop/hdfs/data</value> 13 </property> 14 </configuration
mapred-site.xml:
1 <configuration> 2 <property> 3 <name>mapred.job.tracker</name> 4 <value>localhost:9001</value> 5 </property> 6 </configuration>
5. 格式化HDFS
hadoop namenode -format
如果出现这种错误:
ERROR namenode.NameNode: java.io.IOException: Cannot create directory /home/xxx0624/hadoop/hdfs/name/current
则:将hadoop的目录权限设为当前用户可写sudo chmod -R a+w /home/xxx0624/hadoop,授予hadoop目录的写权限
6. 启动Hadoop
cd /home/xxx0624/hadoop/bin start-all.sh
正确结果如下:
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /home/xxx0624/hadoop/logs/hadoop-xxx0624-namenode-xxx0624-ThinkPad-Edge.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting datanode, logging to /home/xxx0624/hadoop/logs/hadoop-xxx0624-datanode-xxx0624-ThinkPad-Edge.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting secondarynamenode, logging to /home/xxx0624/hadoop/logs/hadoop-xxx0624-secondarynamenode-xxx0624-ThinkPad-Edge.out
starting jobtracker, logging to /home/xxx0624/hadoop/logs/hadoop-xxx0624-jobtracker-xxx0624-ThinkPad-Edge.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting tasktracker, logging to /home/xxx0624/hadoop/logs/hadoop-xxx0624-tasktracker-xxx0624-ThinkPad-Edge.out
可以通过jps命令来验证是否成功:
如果5个守护进程都出现,则正常
7.查看运行状态
http://localhost:50030/ - Hadoop 管理介面
http://localhost:50060/ - Hadoop Task Tracker 状态
http://localhost:50070/ - Hadoop DFS 状态
8. 关闭Hadoop
stop-all.sh