详述执行map reduce 程序的步骤(本地执行MR、服务器上执行MR)

MR程序的执行环境有两种:本地测试环境、服务器环境。

1、本地环境执行MR程序的步骤:

(1)在windows下配置hadoop的环境变量

(2)拷贝debug工具(winutils)到HADOOP_HOME/bin

(3)从源码中拷贝org.apache.hadoop.io.nativeio.NativeIO.java到我们的mr的src目录下,修改NativeIO.java。(大家可去http://download.csdn.net/detail/u013226462/9516657下载。)注意:确保项目的lib需要真实安装的jdk的lib

(4)MR程序调用配置文件的代码需要改变:

a、src不能有服务器的hadoop配置文件

b、在调用时使用:

    Configuration config = new  Configuration();
    config.set("fs.defaultFS", "hdfs://lida1:8020");
    config.set("yarn.resourcemanager.hostname", "lida1");

下面详细介绍如何在本地直接调用MR,在服务器上执行MR。

说明:输入文件、输出路径、项目结构如下图所示

1)、编写MR代码

/**2016年4月29日
 *auther:lida
 *wordCount
 *com.dada.wordCount
 *RunTest.java
 *  */
package com.dada.mr;

/**
 * @author lida
 *
 */
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WC {
	public static Text k = new Text();
	public static IntWritable v = new IntWritable(1);
	public static void main(String[] args) {
		Configuration config = new Configuration();
		config.set("fs.defaultFS", "hdfs://lida1:8020");
		config.set("yarn.resourcemanager.hostname", "lida1");
		try {
			FileSystem fs = FileSystem.get(config);
			Job job = Job.getInstance(config);
			job.setJarByClass(WC.class);

			job.setJobName("wc");
			job.setMapperClass(WordCountMapper.class);
			job.setReducerClass(WordCountReducer.class);
			job.setCombinerClass(WordCountReducer.class);
			job.setMapOutputKeyClass(Text.class);
			job.setMapOutputValueClass(IntWritable.class);

			// 设置MR执行的输入文件
			FileInputFormat.addInputPath(job, new Path("/lida-hdfs_1/input_files/wordCount"));
			// 该目录表示MR执行之后的结果数据所在目录,必须不能存在
			Path outputPath = new Path("/lida-hdfs_1/output_files");
			// 注意:输入文件的绝对路径为:/lida-hdfs_1/input_files/wordCount/wordCount.txt,但是,命令行里不用写其绝对路径。

			if (fs.exists(outputPath)) {
				fs.delete(outputPath, true);
			}
			FileOutputFormat.setOutputPath(job, outputPath);
			boolean f = job.waitForCompletion(true);
			if (f) {
				System.out.println("job 成功执行");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
		// map task执行过程中调用的方法,计算框架会循环调用该方法,默认每行数据调用一次
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			String line = value.toString();
			StringTokenizer st = new StringTokenizer(line);
			while (st.hasMoreTokens()) {
				String word = st.nextToken();
				k.set(word);
				context.write(k, v);
			}
		}
	}

	public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		protected void reduce(Text arg0, Iterable<IntWritable> arg1, Context arg2)
				throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable i : arg1) {
				sum = sum + i.get();
			}
			v.set(sum);
			arg2.write(arg0, v);
		}
	}
}

2)、配置。从源码中拷贝org.apache.hadoop.io.nativeio.NativeIO.java到我们的mr的src目录下,然后修改NativeIO.java。

注意:确保项目的lib是本地安装的jdk的lib

3)、执行。本地直接Run on hadoop。

如果eclipse控制台没报错,那么该MR程序是没问题的。可从下图看到MR的执行过程。

当MR执行完毕后,可从下面两图中看到执行结果。

然后在DFS Location就能看到MR程序的执行结果。如下图所示:

2、服务器环境执行MR程序的步骤:

2.1、在本地直接调用MR,执行MR的过程在服务器上(真正企业运行环境)

(1)在src下放置服务器上的hadoop配置文件

(2)从源码中拷贝 org.apache.hadoop.mapred.YRANRunner.java到我们的mr的src目录下,修改YARNRunner.java。注意:确保项目的lib需要真实安装的jdk的lib

(3)把MR程序打包(jar),直接放到本地

(4)本地执行main方法,servlet调用MR。

下面详细介绍如何在本地直接调用MR,在服务器上执行MR。

说明:输入文件、输出路径、项目结构如下图所示

1)、编写代码

/**2016年4月29日
 *auther:lida
 *wordCount
 *com.dada.wordCount
 *RunTest.java
 *  */
package com.dada.mr;

/**
 * @author lida
 *
 */
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WC {
	public static Text k = new Text();
	public static IntWritable v = new IntWritable(1);

	public static void main(String[] args) {
		Configuration config = new Configuration();
		try {
			FileSystem fs = FileSystem.get(config);
			Job job = Job.getInstance(config);
			job.setJar("C:\\Users\\lida\\Desktop\\wc.jar"); //此处的写法和其他人的写法(job.setJarByClass(XX.class) )可能不同,原因是这样写会报错,下面的错误总结会讲到。
			job.setJobName("wc");
			job.setMapperClass(WordCountMapper.class);
			job.setReducerClass(WordCountReducer.class);
			job.setCombinerClass(WordCountReducer.class);
			job.setMapOutputKeyClass(Text.class);
			job.setMapOutputValueClass(IntWritable.class);

			// 设置MR执行的输入文件
			FileInputFormat.addInputPath(job, new Path("/lida-hdfs_1/input_files/wordCount"));
			// 该目录表示MR执行之后的结果数据所在目录,必须不能存在
			Path outputPath = new Path("/lida-hdfs_1/output_files");
			// 注意:输入文件的绝对路径为:/lida-hdfs_1/input_files/wordCount/wordCount.txt,但是,命令行里不用写其绝对路径。

			if (fs.exists(outputPath)) {
				fs.delete(outputPath, true);
			}
			FileOutputFormat.setOutputPath(job, outputPath);
			boolean f = job.waitForCompletion(true);
			if (f) {
				System.out.println("job 成功执行");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
		// map task执行过程中调用的方法,计算框架会循环调用该方法,默认每行数据调用一次
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			String line = value.toString();
			StringTokenizer st = new StringTokenizer(line);
			while (st.hasMoreTokens()) {
				String word = st.nextToken();
				k.set(word);
				context.write(k, v);
			}
		}
	}

	public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		protected void reduce(Text arg0, Iterable<IntWritable> arg1, Context arg2)
				throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable i : arg1) {
				sum = sum + i.get();
			}
			v.set(sum);
			arg2.write(arg0, v);
		}
	}
}

2)、配置

(1)在src下放置服务器上的hadoop配置文件。core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml。

(2)从源码中拷贝org.apache.hadoop.mapred.YARNRunner.java到我们的mr的src目录下,然后修改YARNRunner.java。

修改哪些代码,请看http://www.360doc.com/content/14/0728/11/597197_397616444.shtml

注意:确保项目的lib是本地安装的jdk的lib

3)、打包。把MR程序打包(jar),直接放到本地。和普通java程序打包的方式相同。如下图:

4)、执行。本地执行main方法,servlet调用MR。

5)、监控。监控方式同上。

2.2、直接在服务器上使用命令的方式调用MR程序,执行过程也在服务器上

(1)把MR程序打包(jar),传送到服务器上

(2)通过命令:hadoop jar jar路径  类的全限定名 例如:hadoop jar /examples/wc.jar com.dada.mr.WordCount(wc.jar为打包好的jar包名;com.dada.mr为man方法所在的包名;WordCount为man方法所在的类)执行MR程序。

下面详细介绍如何在服务器上执行MR程序。

说明:输入文件、输出路径、项目结构如下图所示

1)、编写MR代码

/**
 *
 */
package com.dada.mr;

/**
 * @author 李达
 *
 *  2016年5月5日
 *   wordCount
 *   com.dada.wc
 */

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WC {
public static Text k = new Text();
public static IntWritable v = new IntWritable(1);
public static void main(String[] args) {
Configuration config = new Configuration();
try {
FileSystem fs = FileSystem.get(config);
Job job = Job.getInstance(config);
job.setJarByClass(WC.class);
job.setJobName("wc");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setCombinerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

// 有两种方式解决输入、输出路径的问题:
// 1、代码中写死
// 设置MR执行的输入文件
// FileInputFormat.addInputPath(job, new
//Path("/lida-hdfs_1/input_files/wordCount"));
// 该目录表示MR执行之后的结果数据所在目录,必须不能存在
// Path outputPath = new Path("/lida-hdfs_1/output_files");
// 2、代码中不写死,写个参数,当在命令行执行MR时,再传参(官网是这么干的)。
FileInputFormat.addInputPath(job, new Path(args[0]));
Path outputPath = new Path(args[1]);
// 命令行脚本如下:
// ./hadoop jar /examplesByLiDa/wc.jar com.dada.mr.WC /lida-hdfs_1/input_files/wordCount /lida-hdfs_1/output_files
// 注意:输入文件的绝对路径为:/lida-hdfs_1/input_files/wordCount/wordCount.txt,但是,命令行里不用写其绝对路径。

if (fs.exists(outputPath)) {
fs.delete(outputPath, true);
}
FileOutputFormat.setOutputPath(job, outputPath);
boolean f = job.waitForCompletion(true);
if (f) {
System.out.println("job 成功执行");
}
} catch (Exception e) {
e.printStackTrace();
}
}

static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
// map task执行过程中调用的方法,计算框架会循环调用该方法,默认每行数据调用一次
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens()) {
String word = st.nextToken();
k.set(word);
context.write(k, v);
}
}
}

static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text arg0, Iterable<IntWritable> arg1, Context arg2)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable i : arg1) {
sum = sum + i.get();
}
v.set(sum);
arg2.write(arg0, v);
}
}
}

2)、打包。和普通java程序的打包方式相同。

3)、执行

(1)把打好的jar包传到hdfs(actvice)所在的服务器上

(2)进入hadoop/bin/下,执行脚本

./hadoop jar /examplesByLiDa/wc.jar com.dada.mr.WC /lida-hdfs_1/input_files/wordCount /lida-hdfs_1/output_files

4)、监控。监控方式同上。

到此,全剧终!

附:测试过程中遇到的问题及解决方法

问题1:

org.apache.hadoop.security.AccessControlException: Permission denied: user=aolei, access=EXECUTE, inode="/tmp":root:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:208)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5904)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5886)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:5842)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1626)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1607)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:579)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:2165)
at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1236)
at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1232)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.setPermission(DistributedFileSystem.java:1232)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at com.dada.mr.WC.main(WC.java:66)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=aolei, access=EXECUTE, inode="/tmp":root:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:208)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5904)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5886)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:5842)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1626)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1607)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:579)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:416)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.setPermission(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setPermission(ClientNamenodeProtocolTranslatorPB.java:314)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.setPermission(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:2163)
... 16 more

解决方法:

<property>
      <name>dfs.permissions</name>
       <value>false</value>
</property>
hadoop dfsadmin -refreshNodes

或者

hadoop fs -chmod 777 /tmp

问题2:

ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2016-05-09 16:43:11,531 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0

解决方法:http://www.360doc.com/content/14/0728/11/597197_397616444.shtml

问题3:

2016-05-09 17:15:03,121 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-05-09 17:15:06,883 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-05-09 17:15:06,909 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-05-09 17:15:06,947 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1
2016-05-09 17:15:07,237 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2016-05-09 17:15:07,419 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_1462791130009_0014
2016-05-09 17:15:07,535 INFO  [main] mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(374)) - Job jar is not present. Not adding any jar to the list of resources.
2016-05-09 17:15:07,575 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(236)) - Submitted application application_1462791130009_0014
2016-05-09 17:15:07,605 INFO  [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://lida1:8088/proxy/application_1462791130009_0014/
2016-05-09 17:15:07,606 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_1462791130009_0014
2016-05-09 17:15:11,665 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_1462791130009_0014 running in uber mode : false
2016-05-09 17:15:11,668 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) -  map 0% reduce 0%
2016-05-09 17:15:11,700 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1462791130009_0014 failed with state FAILED due to: Application application_1462791130009_0014 failed 2 times due to AM Container for appattempt_1462791130009_0014_000002 exited with  exitCode: 1 due to: Exception from container-launch: ExitCodeException exitCode=1:
ExitCodeException exitCode=1:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2016-05-09 17:15:11,723 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0

另外,从图形化界面上可看到还有一个报错信息,如下图:

    解决方法:http://www.cnblogs.com/gaoxing/p/4466924.html 在mapred-site.xml中添加进hadoopjar包即可。


问题4:
2016-05-09 17:49:30,572 INFO  [main] mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id : attempt_1462791130009_0020_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.dada.mr.WC$WordCountMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class com.dada.mr.WC$WordCountMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
    ... 8 more
2016-05-09 17:49:37,635 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) -  map 100% reduce 100%
2016-05-09 17:49:37,670 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1462791130009_0020 failed with state FAILED due to: Task failed task_1462791130009_0020_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-05-09 17:49:37,761 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 9
    Job Counters
        Failed map tasks=4
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=16275
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=16275
        Total vcore-seconds taken by all map tasks=16275
        Total megabyte-seconds taken by all map tasks=16665600

参考链接:http://stackoverflow.com/questions/21373550/class-not-found-exception-in-mapreduce-wordcount-job  

    解决方法:

job.setJar("wordcount.jar");

时间: 2024-10-14 03:17:09

详述执行map reduce 程序的步骤(本地执行MR、服务器上执行MR)的相关文章

基于TCP网络通信的自动升级程序源码分析-客户端请求服务器上的升级信息

每次升级,客户端都会获取服务器端存放在upgradefile文件夹下的需要升级的文件和升级信息配置文件(即upgradeconfig.xml文件) 我们来看一下代码 //升级信息配置文件相对应的类 ( 升级信息配置文件是由这个类转化成的) private UpgradeConfig upgradeConfig = null; //客户端存储升级配置文件的地址 是放在客户端根目录下的 (就是把服务器 upgradefile/upgradeconfig.xml下载到客户端存放的位置) string

使用UIWebView加载本地或远程服务器上的网页

大家都知道,使用UIWebView加载本地或远程服务器上的网页,sdk提供了三个加载接口: - (void)loadRequest:(NSURLRequest *)request; - (void)loadHTMLString:(NSString *)string baseURL:(NSURL *)baseURL; - (void)loadData:(NSData *)data MIMEType:(NSString *)MIMEType textEncodingName:(NSString *)

PHP CURL本地可以采集服务器上不能采集解决办法

PHP CURL本地可以采集服务器上不能采集解决办法,今天采集一个站,在本机上写好代码,发到网站服务器上确采集不到数据.这里分析,会不会是目标站对网站做了防采集. 网上搜了下解决办法,这里用PHP CURL伪造IP和来源测试看看.代码如下. //随机IP function Rand_IP(){ $ip2id= round(rand(600000, 2550000) / 10000); //第一种方法,直接生成 $ip3id= round(rand(600000, 2550000) / 10000

如何给Map/Reduce程序传递参数?

前言 以前我们启动一个Map/Reduce,经常是利用hadoop jar ./xxx.jar yyy.KK input output的方式在SHELL脚本或者命令行直接提交作业.但是最近涉及到的一个项目,需要根据配置动态的启动MR作业,也就是涉及到向MAP,REDUCE处理类传递参数的问题. 传递参数的方式 最常见的方式: Configuration conf = new Configuration(); conf.set("key","value"); 然后在M

Hadoop实战:使用Combiner提高Map/Reduce程序效率

好不easy算法搞定了.小数据測试也得到了非常好的结果,但是扔到进群上.挂上大数据就挂了.无休止的reduce不会结束了. .. .. .... .. ... .. ==================================================================== 这才想起还有个combiner! .!!!.!! !.!!.!! !! ! !! ! 我们知道.MapReduce是分为Mapper任务和Reducer任务.Mapper任务的输出,通过网络传输到

.net 禁止远程查看应用程序错误的详细信息,服务器上出现应用程序错误

打开页面时出现以下错误 ? "/"应用程序中的服务器错误. 运行时错误 说明:?服务器上出现应用程序错误.此应用程序的当前自定义错误设置禁止远程查看应用程序错误的详细信息(出于安全原因).但可以通过在本地服务器计算机上运行的浏览器查看.? 详细信息:?若要使他人能够在远程计算机上查看此特定错误消息的详细信息,请在位于当前 Web 应用程序根目录下的"web.config"配置文件中创建一个 <customErrors> 标记.然后应将此 <cust

本地远程连接服务器上的Jupyter Notebook设置方法

完整原文:https://hongwan.xyz/archives/ 使用Jupyter可以在本地电脑直接新建和编辑linux服务器的代码,比如python和R,非常方便: 本机为:Win10 服务器:CentOS Linux release 7.3.1611 1.远程服务器安装Jupyter 本人之前安装Anaconda自带了jupyter notebook,因此可以直接使用.如果没有,可通过以下命令安装: pip install jupyter notebook 2.服务器端Jupyter配

Hadoop 使用Combiner提高Map/Reduce程序效率

众所周知,Hadoop框架使用Mapper将数据处理成一个<key,value>键值对,再网络节点间对其进行整理(shuffle),然后使用Reducer处理数据并进行最终输出. 在上述过程中,我们看到至少两个性能瓶颈: 如果我们有10亿个数据,Mapper会生成10亿个键值对在网络间进行传输,但如果我们只是对数据求最大值,那么很明显的Mapper只需要输出它所知道的最大值即可.这样做不仅可以减轻网络压力,同样也可以大幅度提高程序效率. 使用专利中的国家一项来阐述数据倾斜这个定义.这样的数据远

用python写map reduce程序

利用hadoop streaming框架,帮助我们在map和reduce之间传递数据,通过stdin和stdout. http://wenku.baidu.com/link?url=R1vj6NWV5nv_aVI8P0G5SNzxOyaDsffGeOJrRT6iA9iYHry3w60hJ9CVAtS1iRkh7IOYEuGozIqAZubfXybzf1URxTEY7a2gY9L3LTZQ0Wq