topk记录

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop dfs -rmr output

Deleted hdfs://localhost:9000/user/lk/output

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop jar ~/mytopk.jar top.Top  input output

****hdfs://localhost:9000/user/lk/input

14/05/12 05:14:03 INFO input.FileInputFormat: Total input paths to process : 4

14/05/12 05:14:18 INFO mapred.JobClient: Running job: job_201405120333_0004

14/05/12 05:14:20 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 05:17:32 INFO mapred.JobClient:  map 50% reduce 0%

14/05/12 05:17:36 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000001_0, Status : FAILED

java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: Cannot run program "/bin/ls": java.io.IOException: error=12, Cannot allocate memory

at java.lang.ProcessBuilder.start(ProcessBuilder.java:488)

at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)

at org.apache.hadoop.util.Shell.run(Shell.java:182)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)

at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:703)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)

at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:251)

at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)

at org.apache.hadoop.mapred.Child$4.run(Child.java:260)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory

at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)

at java.lang.ProcessImpl.start(ProcessImpl.java:81)

at java.lang.ProcessBuilder.start(ProcessBuilder.java:470)

... 15 more

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)

at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:251)

at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)

at org.apache.hadoop.mapred.Child$4.run(Child.java:260)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

14/05/12 05:17:58 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 05:18:10 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000000_0, Status : FAILED

java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: Cannot run program "/bin/ls": java.io.IOException: error=12, Cannot allocate memory

at java.lang.ProcessBuilder.start(ProcessBuilder.java:488)

at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)

at org.apache.hadoop.util.Shell.run(Shell.java:182)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)

at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:703)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)

at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:251)

at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)

at org.apache.hadoop.mapred.Child$4.run(Child.java:260)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory

at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)

at java.lang.ProcessImpl.start(ProcessImpl.java:81)

at java.lang.ProcessBuilder.start(ProcessBuilder.java:470)

... 15 more

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)

at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)

at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:251)

at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)

at org.apache.hadoop.mapred.Child$4.run(Child.java:260)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

14/05/12 05:18:14 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000001_1, Status : FAILED

14/05/12 05:20:01 INFO mapred.JobClient:  map 25% reduce 0%

14/05/12 05:20:27 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000000_1, Status : FAILED

attempt_201405120333_0004_m_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0004_m_000000_1: log4j:WARN Please initialize the log4j system properly.

14/05/12 05:20:41 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 05:20:43 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000001_2, Status : FAILED

java.lang.NumberFormatException: For input string: ""

at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

at java.lang.Integer.parseInt(Integer.java:493)

at java.lang.Integer.parseInt(Integer.java:514)

at top.Top$TopKMapper.map(Top.java:28)

at top.Top$TopKMapper.map(Top.java:1)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

14/05/12 05:23:55 INFO mapred.JobClient:  map 25% reduce 0%

14/05/12 05:24:06 INFO mapred.JobClient:  map 50% reduce 0%

14/05/12 05:24:19 INFO mapred.JobClient: Task Id : attempt_201405120333_0004_m_000000_2, Status : FAILED

attempt_201405120333_0004_m_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).

attempt_201405120333_0004_m_000000_2: log4j:WARN Please initialize the log4j system properly.

14/05/12 05:24:33 INFO mapred.JobClient:  map 0% reduce 0%

14/05/12 05:24:58 INFO mapred.JobClient: Job complete: job_201405120333_0004

14/05/12 05:25:01 INFO mapred.JobClient: Counters: 7

14/05/12 05:25:01 INFO mapred.JobClient:   Job Counters

14/05/12 05:25:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=979673

14/05/12 05:25:01 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/12 05:25:01 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/05/12 05:25:01 INFO mapred.JobClient:     Launched map tasks=7

14/05/12 05:25:01 INFO mapred.JobClient:     Data-local map tasks=7

14/05/12 05:25:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

14/05/12 05:25:01 INFO mapred.JobClient:     Failed map tasks=1

[email protected]:~/hadoop-1.0.1/bin$

topk记录

时间: 2024-10-13 11:14:13

topk记录的相关文章

hadoop记录topk

[email protected]:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/to.jar top.Top input output 14/05/12 03:44:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. ****hdfs://loc

MapReduce TopK问题实际应用

一:背景 TopK问题应该是海量数据处理中应用最广泛的了,比如在海量日志数据处理中,对数据清洗完成之后统计某日访问网站次数最多的前K个IP.这个问题的实现方式并不难,我们完全可以利用MapReduce的Shuffle过程实现排序,然后在Reduce端进行简单的个数判断输出即可.这里还涉及到二次排序,不懂的同学可以参考我之前的文章. 二:技术实现 #我们先来看看一条Ngnix服务器的日志: [java] view plain copy 181.133.250.74 - - [06/Jan/2015

minheap+hashmap组合解决动态topK问题(附堆排序完整实现)

TopK的解决方法一般有两种:堆排序和partition.前者用优先队列实现,时间复杂度为O(NlogK)(N为元素总数量),后者可以直接调用C++ STL中的nth_element函数,时间复杂度O(N).如果想获取动态更新数据的topK就不那么容易了,比如实时更新最常访问的top10的网址,显然除了维护一个size为10的最小堆以外还需要一个哈希表实时记录每一个网址的访问次数,并决定是否动态加入到最大堆中,同时可能删除堆中的元素.那么如何获得该网址在堆中的位置呢?需要另一个hashmap记录

海量数据统计topK

有一个1G大小的一个文件,里面每一行是一个词,词的大小不超过16字节,内存限制大小是1M.返回频数最高的100个词. 思路: 把这1G的数据一次性全部读入内存是不可能了,可以每次读一行,然后将该词存到一个哈希表里去,哈希表的value是词出现的次数. 现在的问题是,这个哈希表有多大,能不能装载1M的内存中去. 假设这1G文件里每个词都不一样,那么最多有不同的1G/1Byte = 1G个词,一个哈希表的节点中包含了单词(key),频率(value),next指针,则内存至少要24bytes * 1

飘逸的python - 大数据TopK问题的quick select解法

TopK问题,即寻找最大的K个数,这个问题非常常见,比如从1千万搜索记录中找出最热门的10个关键词. 方法一: 先排序,然后截取前k个数. 时间复杂度:O(n*logn)+O(k)=O(n*logn). 方法二: 最小堆. 维护容量为k的最小堆.根据最小堆性质,堆顶一定是最小的,如果小于堆顶,则直接pass,如果大于堆顶,则替换掉堆顶,并heapify整理堆,其中heapify的时间复杂度是logk. 时间复杂度:O(k+(n-k)*logk)=O(n*logk) 方法三: 本文的主角.quic

老男孩Linux运维第41期20170917开班第四周学习重点课堂记录

第1章 必知必会文件 配置文件位置 该文件作用 /etc/sysconfig/network-scripts/ifcfg-eth0 第一块网卡的配置文件 同setup中的network /etc/resolv.conf 客户端DNS配置文件,优先级低于网卡配置文件 /etc/hosts 主要作用是定义IP地址和主机名的映射关系(域名解析),是一个映射IP地址和主机名的规定 /etc/sysconfig/network 用于配置hostname和networking /etc/fstab 开机自动

SSISDB8:查看SSISDB记录Package执行的消息

在执行Package时,SSISDB都会创建唯一的OperationID 和 ExecutionID,标识对package执行的操作和执行实例(Execution Instance),并记录operation message,统计executable的执行时间,便于developers 优化package的设计,对package进行故障排除. 一,在package发生错误时,查看失败的Executable An executable is a task or container that you

使用插件bootstrap-table实现表格记录的查询、分页、排序等处理

在业务系统开发中,对表格记录的查询.分页.排序等处理是非常常见的,在Web开发中,可以采用很多功能强大的插件来满足要求,且能极大的提高开发效率,本随笔介绍这个bootstrap-table是一款非常有名的开源表格插件,在很多项目中广泛的应用.Bootstrap-table插件提供了非常丰富的属性设置,可以实现查询.分页.排序.复选框.设置显示列.Card view视图.主从表显示.合并列.国际化处理等处理功能,而且该插件同时也提供了一些不错的扩展功能,如移动行.移动列位置等一些特殊的功能,插件可

Git 使用记录

在win7平台已经安装好了git的情况下: 1,Git 本地仓库建立与使用步骤: (2)新建立文件夹: $ mkdir learngit $ cd learngit $ pwd /Users/michael/learngit (1)引入git: 通过git init命令把这个目录变成Git可以管理的仓库: $ git init Initialized empty Git repository in /Users/michael/learngit/.git/ (3)添加文件:git add fil