1.环境准备:
安装Centos6.5的操作系统
下载hadoop2.7版本的软件
wget http://124.205.69.132/files/224400000162626A/mirrors.hust.edu.cn/apache/hadoop/common/stable/hadoop-2.7.1.tar.gz
下载jdk1.87版本的软件
wget http://download.oracle.com/otn-pub/java/jdk/8u60-b27/jdk-8u60-linux-x64.tar.gz?AuthParam=1443446776_174368b9ab1a6a92468aba5cd4d092d0
2.修改/etc/hosts文件及配置互信:
在/etc/hosts文件中增加如下内容:
192.168.1.61 host61
192.168.1.62 host62
192.168.1.63 host63
配置好各服务器之间的ssh互信
3.添加用户,解压文件并配置环境变量:
useradd hadoop
passwd hadoop
tar -zxvf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1 /usr/local
ln -s hadoop-2.7.1 hadoop
chown -R hadoop:hadoop hadoop-2.7.1
tar -zxvf jdk-8u60-linux-x64.tar.gz
mv jdk1.8.0_60 /usr/local
ln -s jdk1.8.0_60 jdk
chown -R root:root jdk1.8.0_60
echo ‘export JAVA_HOME=/usr/local/jdk‘ >>/etc/profile
echo ‘export PATH=/usr/local/jdk/bin:$PATH‘ >/etc/profile.d/java.sh
4.修改hadoop配置文件:
1)修改hadoop-env.sh文件:
cd /usr/local/hadoop/etc/hadoop/hadoop-env.sh
sed -i ‘s%#export JAVA_HOME=${JAVA_HOME}%export JAVA_HOME=/usr/local/jdk%g‘ hadoop-env.sh
2)修改core-site.xml,在最后添加如下内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://host61:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/temp</value>
</property>
</configuration>
3)修改hdfs-site.xml文件:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
4)修改mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>host61:9001</value>
</property>
</configuration>
5)配置masters
host61
6)配置slaves
host62
host63
5.用同样的方式配置host62及host63
6.格式化分布式文件系统
/usr/local/hadoop/bin/hadoop namenode format
7.替换hadoop的库文件:
mv /usr/local/hadoop/lib/native /usr/local/hadoop/lib/native_old
将编译好的hadoop文件下的lib/native文件夹复制过来;
8.运行hadoop
1)/usr/local/hadoop/sbin/start-dfs.sh
2)/usr/local/hadoop/sbin/start-yarn.sh
9.检查:
[[email protected] sbin]# jps
4532 ResourceManager
4197 NameNode
4793 Jps
4364 SecondaryNameNode
[[email protected] ~]# jps
32052 DataNode
32133 NodeManager
32265 Jps
[[email protected] local]# jps
6802 NodeManager
6963 Jps
6717 DataNode
10.通过web了解hadoop:
namenode的信息:
http://192.168.1.61:50070/
secondnamenode的信息:
http://192.168.1.61:50090/
datanode的信息:
http://192.168.1.62:50075/
11.测试
echo "this is the first file" >/tmp/mytest1.txt
echo "this is the second file" >/tmp/mytest2.txt
cd /usr/local/hadoop/bin;
[[email protected] bin]$ ./hadoop fs -mkdir /in
[[email protected] bin]$ ./hadoop fs -put /tmp/mytest*.txt /in
[[email protected] bin]$ ./hadoop fs -ls /in
Found 2 items
-rw-r--r-- 3 hadoop supergroup 23 2015-10-02 18:45 /in/mytest1.txt
-rw-r--r-- 3 hadoop supergroup 24 2015-10-02 18:45 /in/mytest2.txt
[[email protected] hadoop]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /in /out
15/10/02 18:53:30 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/10/02 18:53:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/10/02 18:53:34 INFO input.FileInputFormat: Total input paths to process : 2
15/10/02 18:53:35 INFO mapreduce.JobSubmitter: number of splits:2
15/10/02 18:53:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1954603964_0001
15/10/02 18:53:40 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/10/02 18:53:40 INFO mapreduce.Job: Running job: job_local1954603964_0001
15/10/02 18:53:40 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/10/02 18:53:40 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/10/02 18:53:40 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/10/02 18:53:41 INFO mapred.LocalJobRunner: Waiting for map tasks
15/10/02 18:53:41 INFO mapred.LocalJobRunner: Starting task: attempt_local1954603964_0001_m_000000_0
15/10/02 18:53:41 INFO mapreduce.Job: Job job_local1954603964_0001 running in uber mode : false
15/10/02 18:53:41 INFO mapreduce.Job: map 0% reduce 0%
15/10/02 18:53:41 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/10/02 18:53:41 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/10/02 18:53:41 INFO mapred.MapTask: Processing split: hdfs://host61:9000/in/mytest2.txt:0+24
15/10/02 18:53:51 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/10/02 18:53:51 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/10/02 18:53:51 INFO mapred.MapTask: soft limit at 83886080
15/10/02 18:53:51 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/10/02 18:53:51 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/10/02 18:53:51 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/10/02 18:53:52 INFO mapred.LocalJobRunner:
15/10/02 18:53:52 INFO mapred.MapTask: Starting flush of map output
15/10/02 18:53:52 INFO mapred.MapTask: Spilling map output
15/10/02 18:53:52 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid = 104857600
15/10/02 18:53:52 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
15/10/02 18:53:52 INFO mapred.MapTask: Finished spill 0
15/10/02 18:53:52 INFO mapred.Task: Task:attempt_local1954603964_0001_m_000000_0 is done. And is in the process of committing
15/10/02 18:53:53 INFO mapred.LocalJobRunner: map
15/10/02 18:53:53 INFO mapred.Task: Task ‘attempt_local1954603964_0001_m_000000_0‘ done.
15/10/02 18:53:53 INFO mapred.LocalJobRunner: Finishing task: attempt_local1954603964_0001_m_000000_0
15/10/02 18:53:53 INFO mapred.LocalJobRunner: Starting task: attempt_local1954603964_0001_m_000001_0
15/10/02 18:53:53 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/10/02 18:53:53 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/10/02 18:53:53 INFO mapred.MapTask: Processing split: hdfs://host61:9000/in/mytest1.txt:0+23
15/10/02 18:53:53 INFO mapreduce.Job: map 100% reduce 0%
15/10/02 18:53:53 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/10/02 18:53:53 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/10/02 18:53:53 INFO mapred.MapTask: soft limit at 83886080
15/10/02 18:53:53 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/10/02 18:53:53 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/10/02 18:53:53 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/10/02 18:53:54 INFO mapred.LocalJobRunner:
15/10/02 18:53:54 INFO mapred.MapTask: Starting flush of map output
15/10/02 18:53:54 INFO mapred.MapTask: Spilling map output
15/10/02 18:53:54 INFO mapred.MapTask: bufstart = 0; bufend = 43; bufvoid = 104857600
15/10/02 18:53:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
15/10/02 18:53:54 INFO mapred.MapTask: Finished spill 0
15/10/02 18:53:54 INFO mapred.Task: Task:attempt_local1954603964_0001_m_000001_0 is done. And is in the process of committing
15/10/02 18:53:54 INFO mapreduce.Job: map 50% reduce 0%
15/10/02 18:53:54 INFO mapred.LocalJobRunner: map
15/10/02 18:53:54 INFO mapred.Task: Task ‘attempt_local1954603964_0001_m_000001_0‘ done.
15/10/02 18:53:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1954603964_0001_m_000001_0
15/10/02 18:53:54 INFO mapred.LocalJobRunner: map task executor complete.
15/10/02 18:53:54 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/10/02 18:53:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1954603964_0001_r_000000_0
15/10/02 18:53:54 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/10/02 18:53:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/10/02 18:53:54 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: [email protected]
15/10/02 18:53:55 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
15/10/02 18:53:55 INFO reduce.EventFetcher: attempt_local1954603964_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
15/10/02 18:53:55 INFO mapreduce.Job: map 100% reduce 0%
15/10/02 18:53:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1954603964_0001_m_000001_0 decomp: 55 len: 59 to MEMORY
15/10/02 18:53:56 INFO reduce.InMemoryMapOutput: Read 55 bytes from map-output for attempt_local1954603964_0001_m_000001_0
15/10/02 18:53:56 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 55, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->55
15/10/02 18:53:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1954603964_0001_m_000000_0 decomp: 56 len: 60 to MEMORY
15/10/02 18:53:56 INFO reduce.InMemoryMapOutput: Read 56 bytes from map-output for attempt_local1954603964_0001_m_000000_0
15/10/02 18:53:56 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 56, inMemoryMapOutputs.size() -> 2, commitMemory -> 55, usedMemory ->111
15/10/02 18:53:56 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
15/10/02 18:53:56 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/10/02 18:53:56 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
15/10/02 18:53:57 INFO mapred.Merger: Merging 2 sorted segments
15/10/02 18:53:57 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 97 bytes
15/10/02 18:53:57 INFO reduce.MergeManagerImpl: Merged 2 segments, 111 bytes to disk to satisfy reduce memory limit
15/10/02 18:53:57 INFO reduce.MergeManagerImpl: Merging 1 files, 113 bytes from disk
15/10/02 18:53:57 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
15/10/02 18:53:57 INFO mapred.Merger: Merging 1 sorted segments
15/10/02 18:53:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 102 bytes
15/10/02 18:53:57 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/10/02 18:53:57 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/10/02 18:53:59 INFO mapred.Task: Task:attempt_local1954603964_0001_r_000000_0 is done. And is in the process of committing
15/10/02 18:53:59 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/10/02 18:53:59 INFO mapred.Task: Task attempt_local1954603964_0001_r_000000_0 is allowed to commit now
15/10/02 18:53:59 INFO output.FileOutputCommitter: Saved output of task ‘attempt_local1954603964_0001_r_000000_0‘ to hdfs://host61:9000/out/_temporary/0/task_local1954603964_0001_r_000000
15/10/02 18:53:59 INFO mapred.LocalJobRunner: reduce > reduce
15/10/02 18:53:59 INFO mapred.Task: Task ‘attempt_local1954603964_0001_r_000000_0‘ done.
15/10/02 18:53:59 INFO mapred.LocalJobRunner: Finishing task: attempt_local1954603964_0001_r_000000_0
15/10/02 18:53:59 INFO mapred.LocalJobRunner: reduce task executor complete.
15/10/02 18:53:59 INFO mapreduce.Job: map 100% reduce 100%
15/10/02 18:53:59 INFO mapreduce.Job: Job job_local1954603964_0001 completed successfully
15/10/02 18:54:00 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=821850
FILE: Number of bytes written=1655956
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=118
HDFS: Number of bytes written=42
HDFS: Number of read operations=22
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=2
Map output records=10
Map output bytes=87
Map output materialized bytes=119
Input split bytes=196
Combine input records=10
Combine output records=10
Reduce input groups=6
Reduce shuffle bytes=119
Reduce input records=10
Reduce output records=6
Spilled Records=20
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=352
Total committed heap usage (bytes)=457912320
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=47
File Output Format Counters
Bytes Written=42
[[email protected] hadoop]$
[[email protected] hadoop]$ ./bin/hadoop fs -ls /out
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2015-10-02 18:53 /out/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 42 2015-10-02 18:53 /out/part-r-00000
[[email protected] hadoop]$ ./bin/hadoop fs -cat /out/_SUCCESS
[[email protected] hadoop]$ ./bin/hadoop fs -cat /out/part-r-00000
file 2
first 1
is 2
second 1
the 2
this 2
[[email protected] hadoop]$
12.至此hadoop的配置部署工作顺利完成;