hadoop记录topk

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/to.jar top.Top input output

14/05/12 03:44:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

****hdfs://localhost:9000/user/lk/input

14/05/12 03:44:41 INFO input.FileInputFormat: Total input paths to process : 4

14/05/12 03:44:48 INFO mapred.JobClient: Running job: job_201405120333_0001

14/05/12 03:44:49 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 03:46:36 INFO mapred.JobClient:  map 50% reduce 0%

14/05/12 03:47:28 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 03:47:29 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000000_0, Status : FAILED

attempt_201405120333_0001_m_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000000_0: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:47:44 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000001_0, Status : FAILED

attempt_201405120333_0001_m_000001_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000001_0: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:49:37 INFO mapred.JobClient:  map 50% reduce 0%

14/05/12 03:50:13 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000000_1, Status : FAILED

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

at top.Top$topMap.map(Top.java:1)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201405120333_0001_m_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000000_1: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:50:17 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 03:50:17 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000001_1, Status : FAILED

attempt_201405120333_0001_m_000001_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000001_1: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:52:36 INFO mapred.JobClient:  map 50% reduce 0%

14/05/12 03:52:57 INFO mapred.JobClient:  map 25% reduce 0%

14/05/12 03:53:01 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000001_2, Status : FAILED

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

at top.Top$topMap.map(Top.java:1)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201405120333_0001_m_000001_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000001_2: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:53:09 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 03:53:37 INFO mapred.JobClient: Task Id : attempt_201405120333_0001_m_000000_2, Status : FAILED

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

at top.Top$topMap.map(Top.java:1)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201405120333_0001_m_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0001_m_000000_2: log4j:WARN Please initialize the log4j system properly.

14/05/12 03:54:03 INFO mapred.JobClient: Job complete: job_201405120333_0001

14/05/12 03:54:06 INFO mapred.JobClient: Counters: 7

14/05/12 03:54:06 INFO mapred.JobClient:   Job Counters

14/05/12 03:54:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=830699

14/05/12 03:54:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/12 03:54:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/05/12 03:54:06 INFO mapred.JobClient:     Launched map tasks=8

14/05/12 03:54:06 INFO mapred.JobClient:     Data-local map tasks=8

14/05/12 03:54:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

14/05/12 03:54:06 INFO mapred.JobClient:     Failed map tasks=1

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop dfs -ls output

Found 1 items

drwxr-xr-x   - lk supergroup          0 2014-05-12 03:44 /user/lk/output/_logs

[email protected]:~/hadoop-1.0.1/bin$

package top;

import java.io.IOException;
import java.util.*;
import java.util.TreeMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Top {
	public static class topMap extends Mapper<Text, Text,  IntWritable,Text>{  

		         private TreeMap<Integer, String> topMap = new TreeMap<Integer,String>();  

		         private int topNum = 1;  

		         public void map(Text key,Text value,Context context){  

		             topMap.put(Integer.parseInt(value.toString()), key.toString());  

		             while(topMap.size()>topNum)  

		                 topMap.remove(topMap.firstKey());  

		         }  

		         protected void cleanup(Context context) throws IOException, InterruptedException{  

		             for(Integer entry:topMap.keySet()){  

		                 context.write(new IntWritable(entry),new Text(topMap.get(entry)));  

		             }  

		         }  

		     }  

		     //瀹炵幇闄嶅簭  

		     private static class descendComparator implements Comparator{  

		         @Override 

		         public int compare(Object o1, Object o2) {  

		             // TODO Auto-generated method stub  

		             Integer a=(Integer) o1;  

		             Integer b=(Integer) o2;  

		             return -a.compareTo(b);  

		         }  

		     }  

		     public static class topReduce extends Reducer<IntWritable,Text , IntWritable,Text>{  

		         private TreeMap<Integer, String> topMap =new TreeMap<Integer,String>(new descendComparator());  

		         private int topNum = 1;  

		         public void reduce(IntWritable key,Iterable<Text> values,Context context){  

		             for(Text text:values)  

		                 topMap.put(key.get(),text.toString());  

		             while(topMap.size()>topNum){  

		                 topMap.remove(topMap.firstKey());  

		             }  

		         }  

		         protected void cleanup(Context context) throws IOException, InterruptedException{  

		             for(Integer integer:topMap.keySet()){  

		                context.write(new IntWritable(integer),new Text(topMap.get(integer)));  

		             }  

		         }  

		     }  

		     public static void main(String[]args)throws Exception
		 	{
		 		Configuration conf =new Configuration();

		 		Job job =new Job(conf,"Top");

		 		job.setJarByClass(topMap.class);
		 		job.setMapperClass(topMap.class);
		         job.setReducerClass(topReduce.class);

		 		job.setOutputKeyClass(IntWritable.class);
		 		job.setOutputValueClass(Text.class);

		 		FileInputFormat.addInputPath(job,new Path(args[0]));
		 		FileOutputFormat.setOutputPath(job,new Path(args[1]));//setOutputPath(job,new Path(args[1]));

		 		System.exit(job.waitForCompletion(true)?0:1);

		 	}

}

hadoop记录topk

时间: 2024-12-27 18:55:31

hadoop记录topk的相关文章

Hadoop记录-Linux Service

[Unit] Description=Datanode After=syslog.target network.target auditd.service sshd.service datanode_precheck.service Requires=datanode_precheck.service [Service] User=hdfs Group=hdfs Type=forking Slice=hadoop.slice LimitNOFILE=524288 Environment=CLAS

Hadoop记录-退役

一.datanode添加新节点 1.在dfs.include文件中包含新节点名称,该文件在名称节点的本地目录下 [白名单] [/app/hadoop/etc/hadoop/dfs.include] 2.在hdfs-site.xml文件中添加属性 <property>     <name>dfs.hosts</name>     <value>/app/hadoop/etc/dfs.include.txt</value> </property

Hadoop记录-Apache hadoop+spark集群部署

Hadoop+Spark集群部署指南 (多节点文件分发.集群操作建议salt/ansible) 1.集群规划节点名称 主机名 IP地址 操作系统Master centos1 192.168.0.1 CentOS 7.2Slave1 centos2 192.168.0.2 CentOS 7.2Slave2 centos2 192.168.0.3 Centos 7.22.基础环境配置2.1 hostname配置1)修改主机名在192.168.0.1 root用户下执行:hostnamectl set

Hadoop记录-hadoop jmx配置

1.hadoop-env.sh添加export HADOOP_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=6999 -Dcom.sun.management.jmxremote.ssl=false"在HADOOP_NAMENODE_OPTS加入$HADOOP_JMX_OPTS2.重启namenodehadoop-damon.sh stop

hadoop记录-flink测试

1.启动集群 bin/start-cluster.sh 2.jps查看进程 3.打开网页端(192.168.66.128:8081) 4.造数据:nc -lk 9000 5.执行./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000 原文地址:https://www.cnblogs.com/xinfang520/p/11003510.html

hadoop datanode启动失败

问题导读: 1.Hadoop出现问题时,该如何入手查看问题?2.datanode无法启动,我们该怎么解决?3.如何动态加入DataNode或TaskTracker? 一.问题描述当我多次格式化文件系统时,如 [email protected]:/usr/local/hadoop-1.0.2# bin/hadoop namenode -format 复制代码 会出现datanode无法启动,查看日志,发现错误为: 2012-04-20 20:39:46,501 ERROR org.apache.h

hadoop测试

[email protected]:~$ cd hadoop-1.0.1 [email protected]:~/hadoop-1.0.1$ cd bin [email protected]:~/hadoop-1.0.1/bin$ ./stop_all.sh bash: ./stop_all.sh: 没有那个文件或目录 [email protected]:~/hadoop-1.0.1/bin$ bin/hadoop namenode -formatbash: bin/hadoop: 没有那个文件

安装使用Hadoop遇到的一些问题

安装完后却不能运行Hadoop,仔细查看日志信息,Hadoop记录了详尽的日志信息,日志文件保存在logs文件夹内. 无论是启动,还是以后会经常用到的MapReduce中的每一个job,以及HDFS等相关信息,Hadoop均存有日志文件以供分析. 例如: NameNode和DataNode的namespaceID不一致,日志信息为: java.io.IOException:Incompatible namespaceIDs in/root/tmp/dfs/data:NameNode namesp

topk记录

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop dfs -rmr output Deleted hdfs://localhost:9000/user/lk/output [email protected]:~/hadoop-1.0.1/bin$ ./hadoop jar ~/mytopk.jar top.Top  input output ****hdfs://localhost:9000/user/lk/input 14/05/12 05:14:0