最近,随着大数据的兴起,Java实现的Hadoop成了这个数据领域的领跑者,不管是HDFS,还是MapReduce,还是Hive都成了很热的词汇。也从Hadoop这个软件本体催生了一个依存此的大数据领域,也和时下热的不能再热的云计算拉上了关系。
于是,作为一个程序员,不得不学习新的技术和知识,来保证自己的饭碗,这真是一个极为辛苦的差事。于是,开始接触Hadoop。结果也就是难免不出现错误。
《Hadoop Beginner‘s Guide》这本的指引,进行搭建环境,发现Hadoop已经提供deb的安装版本,于是也就省却了许多的多余工作,直接在ubuntu的软件中心双击之后安装即可。
安装完成之后尝试执行第一个示例程序。
hadoop jar hadoop-examples-1.0.4.jar pi 4 1000,不幸的是出现了如下的错误,
[email protected]:/usr/share/hadoop$ sudo hadoop jar hadoop-examples-1.2.1.jar pi 4 1 Number of Maps = 4 Samples per Map = 1 15/04/17 21:54:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Starting Job 15/04/17 21:54:44 INFO mapred.FileInputFormat: Total input paths to process : 4 15/04/17 21:54:44 INFO mapred.JobClient: Running job: job_local1032904958_0001 15/04/17 21:54:44 INFO mapred.LocalJobRunner: Waiting for map tasks 15/04/17 21:54:44 INFO mapred.LocalJobRunner: Starting task: attempt_local1032904958_0001_m_000000_0 15/04/17 21:54:44 INFO util.ProcessTree: setsid exited with exit code 0 15/04/17 21:54:44 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 15/04/17 21:54:44 INFO mapred.MapTask: Processing split: file:/usr/share/hadoop/PiEstimator_TMP_3_141592654/in/part2:0+118 15/04/17 21:54:44 INFO mapred.MapTask: numReduceTasks: 1 15/04/17 21:54:45 INFO mapred.MapTask: io.sort.mb = 100 15/04/17 21:54:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1032904958_0001_m_000001_0 15/04/17 21:54:45 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 15/04/17 21:54:45 INFO mapred.MapTask: Processing split: file:/usr/share/hadoop/PiEstimator_TMP_3_141592654/in/part1:0+118 15/04/17 21:54:45 INFO mapred.MapTask: numReduceTasks: 1 15/04/17 21:54:45 INFO mapred.MapTask: io.sort.mb = 100 15/04/17 21:54:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1032904958_0001_m_000002_0 15/04/17 21:54:45 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 15/04/17 21:54:45 INFO mapred.MapTask: Processing split: file:/usr/share/hadoop/PiEstimator_TMP_3_141592654/in/part0:0+118 15/04/17 21:54:45 INFO mapred.MapTask: numReduceTasks: 1 15/04/17 21:54:45 INFO mapred.MapTask: io.sort.mb = 100 15/04/17 21:54:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1032904958_0001_m_000003_0 15/04/17 21:54:45 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 15/04/17 21:54:45 INFO mapred.MapTask: Processing split: file:/usr/share/hadoop/PiEstimator_TMP_3_141592654/in/part3:0+118 15/04/17 21:54:45 INFO mapred.MapTask: numReduceTasks: 1 15/04/17 21:54:45 INFO mapred.MapTask: io.sort.mb = 100 15/04/17 21:54:45 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/17 21:54:45 WARN mapred.LocalJobRunner: job_local1032904958_0001 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:954) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) 15/04/17 21:54:45 INFO mapred.JobClient: map 0% reduce 0% 15/04/17 21:54:45 INFO mapred.JobClient: Job complete: job_local1032904958_0001 15/04/17 21:54:45 INFO mapred.JobClient: Counters: 0 15/04/17 21:54:45 INFO mapred.JobClient: Job Failed: NA java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) [email protected]:/usr/share/hadoop$ sudo hadoop jar hadoop-examples-1.2.1.jar pi 1 1 Number of Maps = 1 Samples per Map = 1 15/04/17 21:54:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library Wrote input for Map #0 Starting Job 15/04/17 21:54:51 INFO mapred.FileInputFormat: Total input paths to process : 1 15/04/17 21:54:51 INFO mapred.JobClient: Running job: job_local406287877_0001 15/04/17 21:54:52 INFO mapred.LocalJobRunner: Waiting for map tasks 15/04/17 21:54:52 INFO mapred.LocalJobRunner: Starting task: attempt_local406287877_0001_m_000000_0 15/04/17 21:54:52 INFO util.ProcessTree: setsid exited with exit code 0 15/04/17 21:54:52 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 15/04/17 21:54:52 INFO mapred.MapTask: Processing split: file:/usr/share/hadoop/PiEstimator_TMP_3_141592654/in/part0:0+118 15/04/17 21:54:52 INFO mapred.MapTask: numReduceTasks: 1 15/04/17 21:54:52 INFO mapred.MapTask: io.sort.mb = 100 15/04/17 21:54:52 INFO mapred.LocalJobRunner: Map task executor complete. 15/04/17 21:54:52 WARN mapred.LocalJobRunner: job_local406287877_0001 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:954) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) 15/04/17 21:54:52 INFO mapred.JobClient: map 0% reduce 0% 15/04/17 21:54:52 INFO mapred.JobClient: Job complete: job_local406287877_0001 15/04/17 21:54:52 INFO mapred.JobClient: Counters: 0 15/04/17 21:54:52 INFO mapred.JobClient: Job Failed: NA java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
查阅了忘了无数的资料,都没有作用,通过千方百计的查找,调试,发现关键在于/etc/hadoop/hadoop-env.sh文件分配的内存不够,导致内存不够的错误。现将修改之后的文件黏贴如下:
# Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. <span style="color:#ff0000;">export JAVA_HOME=/usr/lib/jvm/jdk1.8.0</span> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} # The maximum amount of heap to use, in MB. Default is 1000. <span style="color:#ff0000;">export HADOOP_HEAPSIZE=100</span> #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT $HADOOP_NAMENODE_OPTS" HADOOP_JOBTRACKER_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA $HADOOP_JOBTRACKER_OPTS" HADOOP_TASKTRACKER_OPTS="-Dhadoop.security.logger=ERROR,console -Dmapred.audit.logger=ERROR,console $HADOOP_TASKTRACKER_OPTS" HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,DRFAS $HADOOP_DATANODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT $HADOOP_SECONDARYNAMENODE_OPTS" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) <span style="color:#ff0000;">export HADOOP_CLIENT_OPTS="-Xmx200m $HADOOP_CLIENT_OPTS"</span> #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" # On secure datanodes, user to run the datanode as after dropping privileges export HADOOP_SECURE_DN_USER= # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR=/var/log/hadoop/$USER # Where log files are stored in the secure data environment. export HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/ # The directory where pid files are stored. /tmp by default. export HADOOP_PID_DIR=/var/run/hadoop export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER
修改完成之后使之立即生效,执行source hadoop-env.sh即可。
时间: 2024-10-19 12:43:33