[HIve - LanguageManual] Transform [没懂]

Transform/Map-Reduce Syntax

Users can also plug in their own custom mappers and reducers in the data stream by using features natively supported in the Hive 2.0 language. e.g. in order to run a custom mapper script - map_script - and a custom reducer script - reduce_script - the user can issue the following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.

By default, columns will be transformed to STRING and delimited by TAB before feeding to the user script; similarly, all NULL values will be converted to the literal string \N in order to differentiate NULL values from empty strings. The standard output of the user script will be treated as TAB-separated STRINGcolumns, any cell containing only \N will be re-interpreted as a NULL, and then the resulting STRING column will be cast to the data type specified in the table declaration in the usual way. User scripts can output debug information to standard error which will be shown on the task detail page on hadoop. These defaults can be overridden with ROW FORMAT ....

In windows, use "cmd /c your_script" instead of just "your_script"

Warning

Icon

It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer will not give you back what you started with! To help with this, see REGEXP_REPLACE and replace the tabs with some other character on their way into the TRANSFORM() call.

Warning

Icon

Formally, MAP ... and REDUCE ... are syntactic transformations of SELECT TRANSFORM ( ... ). In other words, they serve as comments or notes to the reader of the query. BEWARE: Use of these keywords may be dangerous as (e.g.) typing "REDUCE" does not force a reduce phase to occur and typing "MAP" does not force a new map phase!

Please also see Sort By / Cluster By / Distribute By and Larry Ogrodnek‘s blog post.

clusterBy: CLUSTER BY colName (‘,‘ colName)*

distributeBy: DISTRIBUTE BY colName (‘,‘ colName)*

sortBy: SORT BY colName (ASC | DESC)? (‘,‘ colName (ASC | DESC)?)*

rowFormat

  : ROW FORMAT

    (DELIMITED [FIELDS TERMINATED BY char]

               [COLLECTION ITEMS TERMINATED BY char]

               [MAP KEYS TERMINATED BY char]

               [ESCAPED BY char]

               [LINES SEPARATED BY char]

     |

     SERDE serde_name [WITH SERDEPROPERTIES

                            property_name=property_value,

                            property_name=property_value, ...])

outRowFormat : rowFormat

inRowFormat : rowFormat

outRecordReader : RECORDREADER className

query:

  FROM (

    FROM src

    MAP expression (‘,‘ expression)*

    (inRowFormat)?

    USING ‘my_map_script‘

    ( AS colName (‘,‘ colName)* )?

    (outRowFormat)? (outRecordReader)?

    ( clusterBy? | distributeBy? sortBy? ) src_alias

  )

  REDUCE expression (‘,‘ expression)*

    (inRowFormat)?

    USING ‘my_reduce_script‘

    ( AS colName (‘,‘ colName)* )?

    (outRowFormat)? (outRecordReader)?

  FROM (

    FROM src

    SELECT TRANSFORM ‘(‘ expression (‘,‘ expression)* ‘)‘

    (inRowFormat)?

    USING ‘my_map_script‘

    ( AS colName (‘,‘ colName)* )?

    (outRowFormat)? (outRecordReader)?

    ( clusterBy? | distributeBy? sortBy? ) src_alias

  )

  SELECT TRANSFORM ‘(‘ expression (‘,‘ expression)* ‘)‘

    (inRowFormat)?

    USING ‘my_reduce_script‘

    ( AS colName (‘,‘ colName)* )?

    (outRowFormat)? (outRecordReader)?

SQL Standard Based Authorization Disallows TRANSFORM

The TRANSFORM clause is disallowed when SQL standard based authorization is configured in Hive 0.13.0 and later releases (HIVE-6415).

TRANSFORM Examples

Example #1:

FROM (

  FROM pv_users

  MAP pv_users.userid, pv_users.date

  USING ‘map_script‘

  AS dt, uid

  CLUSTER BY dt) map_output

INSERT OVERWRITE TABLE pv_users_reduced

  REDUCE map_output.dt, map_output.uid

  USING ‘reduce_script‘

  AS date, count;

FROM (

  FROM pv_users

  SELECT TRANSFORM(pv_users.userid, pv_users.date)

  USING ‘map_script‘

  AS dt, uid

  CLUSTER BY dt) map_output

INSERT OVERWRITE TABLE pv_users_reduced

  SELECT TRANSFORM(map_output.dt, map_output.uid)

  USING ‘reduce_script‘

  AS date, count;

Example #2

FROM (

  FROM src

  SELECT TRANSFORM(src.key, src.value) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe‘

  USING ‘/bin/cat‘

  AS (tkey, tvalue) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe‘

  RECORDREADER ‘org.apache.hadoop.hive.ql.exec.TypedBytesRecordReader‘

) tmap

INSERT OVERWRITE TABLE dest1 SELECT tkey, tvalue

Schema-less Map-reduce Scripts

If there is no AS clause after USING my_script, Hive assumes that the output of the script contains 2 parts: key which is before the first tab, and value which is the rest after the first tab. Note that this is different from specifying AS key, value because in that case, value will only contain the portion between the first tab and the second tab if there are multiple tabs.

Note that we can directly do CLUSTER BY key without specifying the output schema of the scripts.

FROM (

  FROM pv_users

  MAP pv_users.userid, pv_users.date

  USING ‘map_script‘

  CLUSTER BY key) map_output

INSERT OVERWRITE TABLE pv_users_reduced

  REDUCE map_output.key, map_output.value

  USING ‘reduce_script‘

  AS date, count;

Typing the output of TRANSFORM

The output fields from a script are typed as strings by default; for example in

SELECT TRANSFORM(stuff)

USING ‘script‘

AS thing1, thing2

They can be immediately casted with the syntax:

SELECT TRANSFORM(stuff)

USING ‘script‘

AS (thing1 INT, thing2 INT)

时间: 2024-10-13 01:43:40

[HIve - LanguageManual] Transform [没懂]的相关文章

原来我没懂(浙江温州一考生)

一    那一次,我拿着一张82分的试卷,独自在路上踌躇着.    雨,淅淅沥沥地下着,是老天在为我的遭遇哭泣吗?叶儿哗啦哗啦地摇着,是树木在为我的命运叹息吗?青蛙呱呱呱呱地叫着,是虫儿在为我鸣不平吗? 二    刺眼的灯光中,我仿佛又看到了那鲜红的83分,那张冷若冰霜的脸,那只举着木棍的手!    我拿出了那张82分的试卷,82分!    妈妈夺过试卷,盯着那个鲜红的分数看了半天,然后,她的脸色变得晴转多云,嘴角下垂,嘴唇蠕动着,“怎么才这么点分数!刚开学,就这样的分数拿回家啊!之前怎么在学的

Hive的Transform功能

Hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现Hive中没有的功能又不想写UDF的情况.例如,按日期统计每天出现的uid数,通常用如下的SQL SELECT date, count(uid) FROM xxx GROUP BY date 但是,如果我想在reduce阶段对每天的uid形成一个列表,进行排序并输出,这在Hive中没有现成的功能.那么,可以自写脚本实现该功能,并用TRANSFORM关键字调用 SELECT TRANSFORM(date, uid) FR

[Hive - LanguageManual] Alter Table/Partition/Column

Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add SerDe Properties Alter Table Storage Properties Additional Alter Table Statements Alter Partition Add Partitions Dynamic Partitions Rename Partition

[Hive - LanguageManual] Hive Data Manipulation Language

LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Synopsis Notes Inserting data into Hive Tables from queries Syntax Synopsis Notes Dynamic Partition Inserts Example Additional Documen

[Hive - LanguageManual] Import/Export

LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Lefty Leverenz on May 14, 2013  (view change) show comment Go to start of metadata Import/Export Import/Export Overview Export Syntax Import Syntax Examples V

[Hive - LanguageManual ] Windowing and Analytics Functions (待)

LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edited by Lefty Leverenz on Aug 01, 2014  (view change) show comment Go to start of metadata Windowing and Analytics Functions Windowing and Analytics Function

[[Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0.13) Restrictions on Hive Commands and Statements Privileges Objects Object Ownership Users and Roles Names of Users and Roles Role Management Commands

[Hive - LanguageManual] Create/Drop/Alter Database Create/Drop/Truncate Table

Hive Data Definition Language Hive Data Definition Language Overview Create/Drop/Alter Database Create/Drop/Truncate Table Alter Table/Partition/Column Create/Drop/Alter View Create/Drop/Alter Index Create/Drop Function Create/Drop/Grant/Revoke Roles

[Hive - LanguageManual] Create/Drop/Grant/Revoke Roles and Privileges / Show Use

Create/Drop/Grant/Revoke Roles and Privileges Hive Default Authorization - Legacy Mode has information about these DDL statements: CREATE ROLE GRANT ROLE REVOKE ROLE GRANT privilege_type REVOKE privilege_type DROP ROLE SHOW ROLE GRANT SHOW GRANT For