《Hive编程指南》14.3 投影变换的实践出错原因分析

自己在学习14.3节投影变换执行SQL语句hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA, newB FROM a;时出现了这个错误

Ended Job = job_local1231989520_0004 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

跟作者的输出不一样。

自己一开始时没有管这个错误,直接跳过这个问题,继续往下看了。== 反思1:有逃避畏难心理,没有钻研的心态==

但在执行接下来的语句SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA FROM a;时又出现了这个错误

Ended Job = job_local1771018374_0006 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

没有明确的错误信息,自己先是另开了一个终端尝试了一下:

[email protected]:~$ /bin/cut
bash: /bin/cut: No such file or directory
[email protected]:~$ echo "4 5" | /bin/cut -f1
bash: /bin/cut: No such file or directory
[email protected]:~$ echo "4 5" |cut -f1
4 5
[email protected]:~$ echo "4 5" |cut -f1
4 5

第一次没有找的出错原因,==反思2:其实这里的第一个命令执行结果就告诉了出错的原因,自己对出错信息不敏感,没有深入思考==。

又在终端尝试了一下,在执行这行语句的时候SELECT TRANSFORM(col1, col2) USING ‘cut -f1‘ AS newA, newB FROM a;得到了预期的结果,同时我又执行ls /bin | grep "cut"并没有cut文件。

现在看来是因为Ubuntu中cut程序并没有放在/bin/目录下导致语句执行出错。

在执行which cut后得到验证,cut放在了/usr/bin/目录下。

相关命令执行记录

  1. Hive终端

    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA, newB FROM a;
    Ended Job = job_local1231989520_0004 with errors
    Error during job, obtaining debugging information...
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA, newB FROM a;
    Ended Job = job_local1383279613_0005 with errors
    Error during job, obtaining debugging information...
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA FROM a;
    Ended Job = job_local1771018374_0006 with errors
    Error during job, obtaining debugging information...
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/cut -f1‘ AS newA FROM a;
    Ended Job = job_local81582517_0007 with errors
    Error during job, obtaining debugging information...
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/sed s/4/10‘ AS newA, newB AS a;
    NoViableAltException([email protected][])
    at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormat(HiveParser.java:34626)
    at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectTrfmClause(HiveParser_SelectClauseParser.java:2021)
    at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1216)
    at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:51850)
    at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:45661)
    at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45568)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44584)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:44454)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1696)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1178)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:444)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1384)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    FAILED: ParseException line 1:67 cannot recognize input near ‘AS‘ ‘a‘ ‘<EOF>‘ in serde specification
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘/bin/sed s/4/10‘ AS newA, newB FROM a;
    /bin/sed: -e expression #1, char 6: unterminated `s‘ command
    org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:585)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:585)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:585)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Ended Job = job_local1180910273_0008 with errors
    Error during job, obtaining debugging information...
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘cut -f1‘ AS newA, newB FROM a;
    newA    newB
    4   NULL
    3   NULL
    hive (default)> SELECT TRANSFORM(col1, col2) USING ‘cut -f1‘ AS newA FROM a;
    newA
    4
    3

    终端1:

    [email protected]:~$ /bin/cut
    bash: /bin/cut: No such file or directory
    [email protected]:~$ echo "4 5" | /bin/cut -f1
    bash: /bin/cut: No such file or directory
    [email protected]:~$ echo "4 5" |cut -f1
    4 5
    [email protected]:~$ echo "4 5" |cut -f1
    4 5
    [email protected]:~$ echo "4 5" |sed s/4/10
    sed: -e expression #1, char 6: unterminated `s‘ command
    [email protected]:~$ sed s/4/10
    sed: -e expression #1, char 6: unterminated `s‘ command

    终端2:

    [email protected]:~$ ls /bin | grep "cut"
    [email protected]:~$ ls /bin
    bash          bzmore          dd             fgrep       kbd_mode    ls          nc                ntfsfallocate  ps         sh                    systemd-inhibit                 uname           zfgrep
    bunzip2       cat             df             findmnt     kill        lsblk       nc.openbsd        ntfsfix        pwd        sh.distrib            systemd-machine-id-setup        uncompress      zforce
    busybox       cgroups-mount   dir            fuser       kmod        lsmod       netcat            ntfsinfo       rbash      sleep                 systemd-notify                  unicode_start   zgrep
    bzcat         cgroups-umount  dmesg          fusermount  less        mkdir       netstat           ntfsls         readlink   ss                    systemd-tmpfiles                vdir            zless
    bzcmp         chacl           dnsdomainname  getfacl     lessecho    mknod       networkctl        ntfsmove       red        static-sh             systemd-tty-ask-password-agent  vmmouse_detect  zmore
    bzdiff        chgrp           domainname     grep        lessfile    mktemp      nisdomainname     ntfstruncate   rm         stty                  tailf                           wdctl           znew
    bzegrep       chmod           dumpkeys       gunzip      lesskey     more        ntfs-3g           ntfswipe       rmdir      su                    tar                             which
    bzexe         chown           echo           gzexe       lesspipe    mount       ntfs-3g.probe     open           rnano      sync                  tempfile                        whiptail
    bzfgrep       chvt            ed             gzip        ln          mountpoint  ntfs-3g.secaudit  openvt         run-parts  systemctl             touch                           ypdomainname
    bzgrep        cp              efibootmgr     hciconfig   loadkeys    mt          ntfs-3g.usermap   pidof          sed        systemd               true                            zcat
    bzip2         cpio            egrep          hostname    login       mt-gnu      ntfscat           ping           setfacl    systemd-ask-password  udevadm                         zcmp
    bzip2recover  dash            false          ip          loginctl    mv          ntfscluster       ping6          setfont    systemd-escape        ulockmgr_server                 zdiff
    bzless        date            fgconsole      journalctl  lowntfs-3g  nano        ntfscmp           plymouth       setupcon   systemd-hwdb          umount                          zegrep
    [email protected]:~$ which cut
    /usr/bin/cut
    [email protected]:~$ /usr/bin/cut
    /usr/bin/cut: you must specify a list of bytes, characters, or fields
    Try ‘/usr/bin/cut --help‘ for more information.

    补充:在写本文的时候发现其实在终端执行/bin/cut的时候已经可以得到出错原因了,bash: /bin/cut: No such file or directory已经提示了/bin/cut不存在,而执行/usr/bin/cut的提示是没有参数。

原文地址:https://www.cnblogs.com/DataNerd/p/8987990.html

时间: 2024-10-10 04:12:20

《Hive编程指南》14.3 投影变换的实践出错原因分析的相关文章

hive编程指南--employees表数据定义

hive编程指南中有个employees表,默认的分隔符比较繁杂,编辑起来不太方便(普通编辑器编辑的控制字符^A等被当成字符串处理了,没有起到分隔符的作用).收集的解决方案如下: http://www.myexception.cn/software-architecture-design/1351552.html http://blog.csdn.net/lichangzai/article/details/18703971 切记,简单的文本编辑器编辑如下的内容,分隔符是没被识别的,^A^B^C

Hive编程指南_学习笔记01

第四章: HQl的数据定义 1:创建数据库 create database financials; create database  if not exists financials; 2: 查看数据库 show databases; 模糊查询数据库 show databases like 'h.*' ; 3:创建数据库改动数据库的默认位置 create database financials localtion '/my/preferred/directory' 4:添加数据库的描写叙述信息

Hive编程指南学习笔记(1)

hive一次使用命令: $ hive -e "select * from mytable limit 1;" OK name1 1 name2 2 Time taken: 3.935 seconds $ hive -e "select * from mytable limit 1;" > /tmp/myfile $ cat /tmp/myfile OK name1 1 name2 2 Time taken: 3.935 seconds 静默模式: $ hive

hive编程指南——读书笔记(无知小点记录)

set hive.metastore.warehouse.dir=/user/myname/hive/warehouse; 用户设定自己的数据仓库目录.不影响其他用户.也在$HOME/.hiverc中设置,则每次启动hive自动加载 hive -(d,ef,H,h,i,p,S,v) 定义变量var,在hql中直接引用${var} set (显示或修改) set; (看所有变量) set env:HOME; set -V; 不加-V打印命名空间 hive --define foo=bar (-d简

HIVE编程指南之HiveQL的学习笔记1

// HiveQLa) 数据定义语言1 数据库表的一个目录或命名空间,如果用户没有指定数据库的话,那么将会使用默认的数据库default-----创建数据库CREATE DATABASE guoyongrong;// 给每个数据库创建了一个目录,数据库的文件目录名是以.db结尾的CREATE DATABASE IF NOT EXISTS guoyongrong; // 避免在数据库存在的创建错误CREATE DATABASE guoyongrong LOCATION '/my/preferred

Hive编程指南学习笔记(2)

我们可以使用describe extended financial.employee命令来查看这个表的详细表结构信息(如果当前所处的工作数据库就是financial,那可以不佳finanacial). 如果使用formatted替代关键字extended的话,那可以得到更多的输出信息. 如果用户只想查看某一列的信息,那么只要在表名后增加这个字段的名称即可.这种情况下,使用extended关键字也不会增加更多的输出信息: hive> describe financial.employee.sala

Archive for the ‘Erlang’ Category 《Erlang编程指南》读后感

http://timyang.net/category/erlang/ 在云时代,我们需要有更好的能利用多核功能及分布式能力的编程语言,Erlang在这方面具有天生的优势,因此我们始终对它保持强烈关注. 按:此为客座文章,投稿人为新浪微博基础研发工程师赵鹏城(http://weibo.com/iamzpc),以下为原文.在对一个分布式KV存储系统的研究过程中,我有幸遇到了Erlang语言.因此,我研究工作的第一目标就是快速入门Erlang语言并在实际研究过程中进一步深入理解Erlang的精髓.在

高级Bash脚本编程指南

http://tldp.org/LDP/abs/html/ 高级Bash脚本编程指南对脚本语言艺术的深入探索 本教程不承担以前的脚本或编程知识,但进展迅速走向一个中级/高级水平的指令...一直偷偷在细小的UNIX®智慧和学识.它作为一本教科书,一本手册,自学,并作为一个参考和知识的来源,壳牌的脚本技术.练习和大量的评论实例请读者参与,在这样的前提下,真正学习脚本的唯一途径是编写脚本.这本书是适合课堂使用的一般介绍编程的概念.本文件被授予公共领域.没有版权! 奉献对于安妮塔,所有魔术的来源内容表第

使用结构(C# 编程指南)

struct 类型适于表示 Point.Rectangle 和 Color 等轻量对象. 尽管使用自动实现的属性将一个点表示为类同样方便,但在某些情况下使用结构更加有效. 例如,如果声明一个 1000 个 Point 对象组成的数组,为了引用每个对象,则需分配更多内存:这种情况下,使用结构可以节约资源. 因为 .NET Framework 包含一个名为 Point 的对象,所以本示例中的结构命名为“CoOrds”. 1 public struct CoOrds 2 { 3 public int