Hive - Create Table&Drop Table & ALTER Table(上)

写在前面:本来想着把表的创建,删除,以及修改一篇搞定的。结果看了一下,东西还是蛮多的,而且也是很多经常使用的操作。所以,就暂且分开处理吧。

特别提醒:在日常不管是创建库、表还是修改字段,删除等操作,建议都加上 [IF NOT EXISTS] | [IF EXISTS] 选项;虽然是可选项,但是还是小心为上,万一你在操作时没有加库名,又操作错了,那你哭都找不到地方。

This chapter explains how to create a table and how to insert data into it. The conventions of creating a table in HIVE is quite similar to creating a table using SQL.

Create Table Statement

Create Table is a statement used to create a table in Hive. The syntax and example are as follows:

Syntax

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

译注:hive中stored的 file_format目前有:(参考http://blog.csdn.net/yfkiss/article/details/7787742)

  • TEXTFILE:默认格式,数据不做压缩,磁盘开销大,数据解析开销大。可结合Gzip、Bzip2使用(系统自动检查,执行查询时自动解压),但使用这种方式,hive不会对数据进行切分,从而无法对数据进行并行操作。
  • SEQUENCEFILE:SequenceFile是Hadoop API提供的一种二进制文件支持,其具有使用方便、可分割、可压缩的特点。SequenceFile支持三种压缩选择:NONE, RECORD, BLOCK。 Record压缩率低,一般建议使用BLOCK压缩。
  • RCFILE:RCFILE是一种行列存储相结合的存储方式。首先,其将数据按行分块,保证同一个record在一个块上,避免读一个记录需要读取多个block。其次,块数据列式存储,有利于数据压缩和快速的列存取。RCFILE文件示例:

  • 自定义格式:当用户的数据文件格式不能被当前 Hive 所识别的时候,可以自定义文件格式。
    用户可以通过实现inputformat和outputformat来自定义输入输出格式,参考代码:
    .\hive-0.8.1\src\contrib\src\java\org\apache\hadoop\hive\contrib\fileformat\base64

Example

Let us assume you need to create a table named employee using CREATE TABLE statement. The following table lists the fields and their data types in employee table:

Sr.No Field Name Data Type
1 Eid int
2 Name String
3 Salary Float
4 Designation string

The following data is a Comment, Row formatted fields such as Field terminator, Lines terminator, and Stored File type.

COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED IN TEXT FILE

The following query creates a table named employee using the above data.

hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

译注:目前我使用的脚本样例如下:按照dt 字段进行分区,这个后续有一篇专门讲到了分区,你可以先去看看英文版https://www.tutorialspoint.com/hive/hive_partitioning.htm

CREATE TABLE IF NOT EXISTS  `snapshot_task_sub` (
  `task_sub_id` INT COMMENT ‘任务扩展子表ID‘,
  `task_id` INT COMMENT ‘任务ID‘,
  `car_series` INT COMMENT ‘车系ID‘,
  `series_name` STRING COMMENT ‘车系名称‘,
  `purchase_amount` INT COMMENT ‘购买数量‘,
  `price` DOUBLE COMMENT ‘最新投放单价‘,
  `published_price` DOUBLE COMMENT ‘刊例价‘,
  `state` TINYINT COMMENT ‘状态 0正常 2删除‘,
  `create_time` STRING COMMENT ‘创建时间‘,
  `edit_time` STRING  COMMENT ‘修改时间‘,
  `snap_time` STRING COMMENT ‘快照时间‘
)
COMMENT ‘任务子表天快照表‘
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘ STORED AS TEXTFILE ;

If you add the option IF NOT EXISTS, Hive ignores the statement in case the table already exists.

On successful creation of table, you get to see the following response:

OK
Time taken: 5.905 seconds
hive>

JDBC Program

The JDBC program to create a table is given example.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveCreateTable {
   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

   public static void main(String[] args) throws SQLException {

      // Register driver and create driver instance
      Class.forName(driverName);

      // get connection
      Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");

      // create statement
      Statement stmt = con.createStatement();

      // execute statement
      stmt.executeQuery("CREATE TABLE IF NOT EXISTS "
         +" employee ( eid int, name String, "
         +" salary String, destignation String)"
         +" COMMENT ‘Employee details’"
         +" ROW FORMAT DELIMITED"
         +" FIELDS TERMINATED BY ‘\t’"
         +" LINES TERMINATED BY ‘\n’"
         +" STORED AS TEXTFILE;");

      System.out.println(“ Table employee created.”);
      con.close();
   }
}

Save the program in a file named HiveCreateDb.java. The following commands are used to compile and execute this program.

$ javac HiveCreateDb.java
$ java HiveCreateDb

Output

Table employee created.

Load Data Statement

Generally, after creating a table in SQL, we can insert data using the Insert statement. But in Hive, we can insert data using the LOAD DATA statement.

While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.

通常,在SQL中创建表之后,我们可以使用Insert语句插入数据。 但在Hive中,我们可以使用LOAD DATA语句插入数据。

在将数据插入Hive时,最好使用LOAD DATA来存储批量记录。 有两种方式加载数据:一种来自本地文件系统,另一种来自Hadoop文件系统。

Syntax

The syntax for load data is as follows:

LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]
  • LOCAL is identifier to specify the local path. It is optional. LOCAL是用于指定本地路径的标识符,可选参数
  • OVERWRITE is optional to overwrite the data in the table. -- 如果指定了OVERWRITE,那么会覆盖表内所有数据,慎重
  • PARTITION is optional.可选参数

Example

We will insert the following data into the table. It is a text file named sample.txt in /home/user directory.

1201  Gopal       45000    Technical manager
1202  Manisha     45000    Proof reader
1203  Masthanvali 40000    Technical writer
1204  Kiran       40000    Hr Admin
1205  Kranthi     30000    Op Admin

The following query loads the given text into the table.

hive> LOAD DATA LOCAL INPATH ‘/home/user/sample.txt‘
OVERWRITE INTO TABLE employee;

On successful download, you get to see the following response:

OK
Time taken: 15.905 seconds
hive>

JDBC Program

Given below is the JDBC program to load given data into the table.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveLoadData {

   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

   public static void main(String[] args) throws SQLException {

      // Register driver and create driver instance
      Class.forName(driverName);

      // get connection
      Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");

      // create statement
      Statement stmt = con.createStatement();

      // execute statement
      stmt.executeQuery("LOAD DATA LOCAL INPATH ‘/home/user/sample.txt‘" + "OVERWRITE INTO TABLE employee;");
      System.out.println("Load Data into employee successful");

      con.close();
   }
}

Save the program in a file named HiveLoadData.java. Use the following commands to compile and execute this program.

$ javac HiveLoadData.java
$ java HiveLoadData

Output:

Load Data into employee successful

-------------英文文章地址:https://www.tutorialspoint.com/hive/hive_create_table.htm
时间: 2024-10-16 13:03:40

Hive - Create Table&Drop Table & ALTER Table(上)的相关文章

数据定义: CREATE、DROP、ALTER

1 CREATE DATABASE 句法 2 3 CREATE DATABASE [IF NOT EXISTS] db_name 4 5 CREATE DATABASE 以给定名字创建一个数据库.允许的数据库名规则在章节 6.1.2 数据库.表.索引.列和别名 中被给出. 如果数据库已经存在,并且你没有指定 IF NOT EXISTS,这时会产生一个错误. 6 7 在 MySQL 中,数据库以包含数据库表对应文件的目录实现的.因为数据库在初始创建时没有表,所以 CREATE DATABASE 语

[Hive - LanguageManual] Alter Table/Partition/Column

Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add SerDe Properties Alter Table Storage Properties Additional Alter Table Statements Alter Partition Add Partitions Dynamic Partitions Rename Partition

[转]ALTER TABLE的用法 增多列、删除列、改列名、改列约束、改表名

[转]ALTER TABLE的用法 增加列.删除列.改列名.改列约束.改表名 ALTER TABLE 名称ALTER TABLE - 更改表属性语法ALTER TABLE table [ * ]      ADD [ COLUMN ] column typeALTER TABLE table [ * ]      ALTER [ COLUMN ] column { SET DEFAULT value | DROP DEFAULT }ALTER TABLE table [ * ]      REN

ALTER TABLE - 修改表的定义

SYNOPSIS ALTER TABLE [ ONLY ] name [ * ] ADD [ COLUMN ] column type [ column_constraint [ ... ] ] ALTER TABLE [ ONLY ] name [ * ] DROP [ COLUMN ] column [ RESTRICT | CASCADE ] ALTER TABLE [ ONLY ] name [ * ] ALTER [ COLUMN ] column { SET DEFAULT expr

alter table table_name enable row movement

Row movement 从字面意思解释为行移动.默认情况下,oracle数据块中的一行其生命周期内是不会发生移动的,即其rowid不会发生改变. 但是在某些情景下,我们希望行的rowid可以发生变化,这时候我们就需要启动表的row movement特性. 启用row movement特性,使用如下语句: Alter table table_name enable row movement; 通查在三种情景下,需要启用row movement. 1:分区表 当我们允许分区表的分区键是可更新的时候

改表 alter table

ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ] action [, ... ] ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ] RENAME [ COLUMN ] column_name TO new_column_name ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ] RENAME CONSTRAINT constraint_name TO new_constr

SQL高级应用(DROP、ALTER)

SQL 撤销索引.表以及数据库 通过使用 DROP 语句,可以轻松的删除索引.表和数据库 SQL DROP INDEX  使用DROP INDEX 命令删除表格中的索引 用于 Microsoft SQLJet(以及 Microsoft Access)的语法 DROP INDEX index_name ON table_name 用于 MS SQL Server 的语法 DROP INDEX table_name.index_name 用于 IBM DB2 和 Oracle的语法 DROP IND

Hive - Create Table&Drop Table & ALTER Table(中)

译注:书接上篇,了解过创建表以及load data后,假如发现需要更改表字段类型或者添加表字段,怎么办?这篇文章将进一步了解具体细节. This chapter explains how to alter the attributes of a table such as changing its table name, changing column names, adding columns, and deleting or replacing columns. Alter Table St

create index 与 alter table add index 区别

众所周知,MySQL创建索引有两种语法,即:ALTER TABLE HeadOfState ADD INDEX (LastName, FirstName);CREATE INDEX index_name HeadOfState (LastName, FirstName);那么,这两种语法有什么区别呢? 在网上找了一下,在一个英文网站上,总结了下面几个区别,我翻译出来,如下:1.CREATE INDEX必须提供索引名,对于ALTER TABLE,将会自动创建,如果你不提供:2.CREATE IND