hadoop的Hive操作hdfs文件第一天

hive的具体练习以下4个目标
1. 第一普通的hdfs文件能导入到hive中以供我们查询。

create table dept(deptID int,deptName string,address string);

load data local inpath ‘/home/dyq/Documents/dept‘ overwrite into table dept;

select * from dept;

hive> select * from dept;

OK

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

NULL NULL NULL

Time taken: 0.316 seconds, Fetched: 4 row(s)

从上面看导入成功但是查询的都是null那是因为没有加分隔.test表默认的有terminated by ‘\t‘ 
lines terminated by ‘\n‘  分隔符所以尽管有报错数据也是插入的。
正确的创建语法为

create table dept1(deptID int,deptName string,address string) row format delimited

fields terminated by ‘,‘ lines terminated by ‘\n‘ stored as textfile

导入数据的时候语句不变

load data local inpath ‘/home/dyq/Documents/dept‘ overwrite into table dept1;

hive> select * from dept1;

OK

10 ACCOUNTING NEW YORK

20 RESEARCH DALLAS

30 SALES CHICAGO

40 OPERATIONS BOSTON

Time taken: 0.153 seconds, Fetched: 4 row(s)

看看成功了

二、查询结果导出来。

从hive中把表中的数据导出来保存成文本类型。
先检索索要的结果

hive> select deptID,deptName from dept1;

OK

10 ACCOUNTING NEW YORK

20 RESEARCH DALLAS

30 SALES CHICAGO

40 OPERATIONS BOSTON

Time taken: 4.735 seconds, Fetched: 4 row(s)

将查询结果输出至本地目录

insert overwrite local directory ‘/home/dyq/Documents/outputdept‘ select a.deptID,a.deptName from dept1 a;

hive> insert overwrite local directory ‘/home/dyq/Documents/outputdept‘ select a.deptID,a.deptName from dept1 a;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = dyq_20160828100454_07848914-19c7-47a7-9dd6-7fe50a1f9a82

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there‘s no reduce operator

Starting Job = job_1472347049416_0003, Tracking URL = http://ubuntu:8088/proxy/application_1472347049416_0003/

Kill Command = /opt/hadoop-2.6.2/bin/hadoop job  -kill job_1472347049416_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2016-08-28 10:05:14,004 Stage-1 map = 0%,  reduce = 0%

2016-08-28 10:05:26,132 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.23 sec

MapReduce Total cumulative CPU time: 2 seconds 230 msec

Ended Job = job_1472347049416_0003

Moving data to local directory /home/dyq/Documents/outputdept

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 2.23 sec   HDFS Read: 3564 HDFS Write: 49 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 230 msec

OK

Time taken: 34.259 seconds

三、查看到处结果

在/home/dyq/Documents下出现了outdept目录里面有000000_0文件打开

10ACCOUNTING

20RESEARCH

30SALES

40OPERATIONS

呵呵成功了额

时间: 2024-10-10 14:02:03

hadoop的Hive操作hdfs文件第一天的相关文章

Hadoop学习笔记0002——HDFS文件操作

  说明:Hadoop之HDFS文件操作常有两种方式,命令行方式和JavaAPI方式. 方式一:命令行方式 Hadoop文件操作命令形式为:hadoop fs -cmd <args> 说明:cmd是具体的文件操作命令,<args>是一组数目可变的参数. Hadoop最常用的文件操作命令,包括添加文件和目录.获取文件.删除文件等. 1 添加文件和目录 HDFS有一个默认工作目录/usr/$USER,其中$USER是你的登录用户名,作者的用户名是root.该目录不能自动创建,需要执行m

hadoop实践02---eclipse操作hdfs的api上传文件

1.eclipse中编写代码后双击main方法--->Run as ---> java application ,然后指定的文件 就会提交到hdfs中. 2.查看文件:http://192.168.108.128:50070/dfshealth.html#tab-overview package hdfs24; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Confi

使用hadoop的类操作HDFS

1. 创建文件夹 private static final String PATH = "hdfs://hadoop:9000/"; private static final String DIR = "/d2"; public static void main(String[] args) throws Exception { FileSystem fileSystem = FileSystem.get(new URI(PATH), new Configurati

hadoop(4)Hdfs文件介绍

1.dfs.nameservices 说明:为namenode集群定义一个services name 默认值:null 比如设置为:ns1 2.dfs.ha.namenodes.<dfs.nameservices> 说明:nameservice 包含哪些namenode,为各个namenode起名 默认值:null 比如设置为nn1, nn2 3.dfs.namenode.rpc-address.ns1.nn1 说明:名为nn1的namenode 的rpc地址和端口号,rpc用来和datano

Hadoop读书笔记(三)Java API操作HDFS

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 JAVA URL 操作HDFS OperateByURL.java package hdfs; import java.io.InputStream; import jav

客户端用java api 远程操作HDFS以及远程提交MR任务(源码和异常处理)

两个类,一个HDFS文件操作类,一个是wordcount 词数统计类,都是从网上看来的.上代码: package mapreduce; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.BlockLocation; import org.apac

Hadoop之HDFS文件操作

摘要:Hadoop之HDFS文件操作常有两种方式,命令行方式和JavaAPI方式.本文介绍如何利用这两种方式对HDFS文件进行操作. 关键词:HDFS文件    命令行     Java API HDFS是一种分布式文件系统,为MapReduce这种框架下的海量数据分布式处理而设计. Hadoop之HDFS文件操作常有两种方式,一种是命令行方式,即Hadoop提供了一套与Linux文件命令类似的命令行工具:另一种是JavaAPI,即利用Hadoop的Java库,采用编程的方式操作HDFS的文件.

Hadoop HDFS文件常用操作及注意事项

1.Copy a file from the local file system to HDFS The srcFile variable needs to contain the full name (path + file name) of the file in the local file system. The dstFile variable needs to contain the desired full name of the file in the Hadoop file s

Hadoop大象之旅009-通过Shell操作hdfs

Hadoop大象之旅009-通过Shell操作hdfs 老帅 调用Hadoop分布式文件系统(HDFS)Shell命令应使用bin/hadoop fs <args>的形式. 1.查看hdfs都有哪些Shell命令 参照前面章节中所述方法,使用SecureCRTPortable.exe登录CentOS; 使用命令"hadoopfs"查看hdfs的命令列表,如下图所示: hdfs的所有命令的格式,都为:hadoop fs –xx,具体使用方法,可参考官方文档: http://h