[Hive - LanguageManual ] Explain (待)

EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query

AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961.

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts:

  • The Abstract Syntax Tree for the query
  • The dependencies between the different stages of the plan
  • The description of each of the stages

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

As an example, consider the following EXPLAIN query:

EXPLAIN

FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

The output of this statement contains the following parts:

  • The Abstract Syntax Tree

    ABSTRACT SYNTAX TREE:

      (TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key))))

  • The Dependency Graph

    STAGE DEPENDENCIES:

      Stage-1 is a root stage

      Stage-2 depends on stages: Stage-1

      Stage-0 depends on stages: Stage-2

    This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.

  • The plans of each Stage

    STAGE PLANS:

      Stage: Stage-1

        Map Reduce

          Alias -> Map Operator Tree:

            src

                Reduce Output Operator

                  key expressions:

                        expr: key

                        type: string

                  sort order: +

                  Map-reduce partition columns:

                        expr: rand()

                        type: double

                  tag: -1

                  value expressions:

                        expr: substr(value, 4)

                        type: string

          Reduce Operator Tree:

            Group By Operator

              aggregations:

                    expr: sum(UDFToDouble(VALUE.0))

              keys:

                    expr: KEY.0

                    type: string

              mode: partial1

              File Output Operator

                compressed: false

                table:

                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat

                    output format: org.apache.hadoop.mapred.SequenceFileOutputFormat

                    name: binary_table

      Stage: Stage-2

        Map Reduce

          Alias -> Map Operator Tree:

            /tmp/hive-zshao/67494501/106593589.10001

              Reduce Output Operator

                key expressions:

                      expr: 0

                      type: string

                sort order: +

                Map-reduce partition columns:

                      expr: 0

                      type: string

                tag: -1

                value expressions:

                      expr: 1

                      type: double

          Reduce Operator Tree:

            Group By Operator

              aggregations:

                    expr: sum(VALUE.0)

              keys:

                    expr: KEY.0

                    type: string

              mode: final

              Select Operator

                expressions:

                      expr: 0

                      type: string

                      expr: 1

                      type: double

                Select Operator

                  expressions:

                        expr: UDFToInteger(0)

                        type: int

                        expr: 1

                        type: double

                  File Output Operator

                    compressed: false

                    table:

                        input format: org.apache.hadoop.mapred.TextInputFormat

                        output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat

                        serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe

                        name: dest_g1

      Stage: Stage-0

        Move Operator

          tables:

                replace: true

                table:

                    input format: org.apache.hadoop.mapred.TextInputFormat

                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat

                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe

                    name: dest_g1

    In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.

A map/reduce stage itself comprises of 2 parts:

  • A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
  • A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

EXPLAIN DEPENDENCY

  SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

the following output is produced:

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"[email protected]","tabletype":"MANAGED_TABLE"}]}

The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.

The dependencies show the parents in case a table is accessed via a view. Consider the following queries:

CREATE VIEW V1 AS SELECT key, value from src;

EXPLAIN DEPENDENCY SELECT * FROM V1;

The following output is produced:

{"input_partitions":[],"input_tables":[{"tablename":"[email protected]","tabletype":"VIRTUAL_VIEW"},{"tablename":"[email protected]","tabletype":"MANAGED_TABLE","tableParents":"[[email protected]]"}]}

As above, the inputs contain the view V1 and the table ‘src‘ that the view V1 refers to.

All the outputs are shown if a table is being accessed via multiple parents.

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;

CREATE VIEW V4 AS

  SELECT src1.key, src2.value as value1, src3.value as value2

  FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;

EXPLAIN DEPENDENCY SELECT * FROM V4;

The following output is produced.

{"input_partitions":[{"partitionParents":"[[email protected]]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[[email protected]]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[[email protected]]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[[email protected]]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"[email protected]","tabletype":"VIRTUAL_VIEW"},{"tablename":"[email protected]","tabletype":"VIRTUAL_VIEW","tableParents":"[[email protected]]"},{"tablename":"[email protected]","tabletype":"VIRTUAL_VIEW","tableParents":"[[email protected]]"},{"tablename":"[email protected]","tabletype":"MANAGED_TABLE","tableParents":"[[email protected], [email protected]]"},{"tablename":"[email protected]","tabletype":"MANAGED_TABLE","tableParents":"[[email protected]]"}]}

As can be seen, src is being accessed via parents v1 and v4.

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:

EXPLAIN AUTHORIZATION

  SELECT * FROM src JOIN srcpart;

the following output is produced:

INPUTS:

  default@srcpart

  default@src

  default@srcpart@ds=2008-04-08/hr=11

  default@srcpart@ds=2008-04-08/hr=12

  default@srcpart@ds=2008-04-09/hr=11

  default@srcpart@ds=2008-04-09/hr=12

OUTPUTS:

  hdfs://localhost:9000/tmp/.../-mr-10000

CURRENT_USER:

  navis

OPERATION:

  QUERY

AUTHORIZATION_FAILURES:

  Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

With the FORMATTED keyword, it will be returned in JSON format.

"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["[email protected]","[email protected]","[email protected]@ds=2008-04-08/hr=11","[email protected]@ds=2008-04-08/hr=12","[email protected]@ds=2008-04-09/hr=11","[email protected]@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}

时间: 2024-11-03 22:02:06

[Hive - LanguageManual ] Explain (待)的相关文章

Hive的Explain命令

Hive的Explain命令,用于显示SQL查询的执行计划. Hive查询被转化成序列阶段(这是一个有向无环图).这些阶段可能是mapper/reducer阶段,或者是Metastore或文件系统的操作,如移动和重命名的阶段. 例子 hive> explain > select * from student > cluster by age; 1.查询的抽象语法树.[本人使用hive-0.13,没有显示抽象语法树] 2.执行计划计划的不同阶段之间的依赖关系 OK STAGE DEPEND

[Hive - LanguageManual] Alter Table/Partition/Column

Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add SerDe Properties Alter Table Storage Properties Additional Alter Table Statements Alter Partition Add Partitions Dynamic Partitions Rename Partition

[Hive - LanguageManual] Hive Data Manipulation Language

LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Synopsis Notes Inserting data into Hive Tables from queries Syntax Synopsis Notes Dynamic Partition Inserts Example Additional Documen

[Hive - LanguageManual] Import/Export

LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Lefty Leverenz on May 14, 2013  (view change) show comment Go to start of metadata Import/Export Import/Export Overview Export Syntax Import Syntax Examples V

[Hive - LanguageManual ] Windowing and Analytics Functions (待)

LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edited by Lefty Leverenz on Aug 01, 2014  (view change) show comment Go to start of metadata Windowing and Analytics Functions Windowing and Analytics Function

[[Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0.13) Restrictions on Hive Commands and Statements Privileges Objects Object Ownership Users and Roles Names of Users and Roles Role Management Commands

[Hive - LanguageManual] Create/Drop/Alter Database Create/Drop/Truncate Table

Hive Data Definition Language Hive Data Definition Language Overview Create/Drop/Alter Database Create/Drop/Truncate Table Alter Table/Partition/Column Create/Drop/Alter View Create/Drop/Alter Index Create/Drop Function Create/Drop/Grant/Revoke Roles

[Hive - LanguageManual] Create/Drop/Grant/Revoke Roles and Privileges / Show Use

Create/Drop/Grant/Revoke Roles and Privileges Hive Default Authorization - Legacy Mode has information about these DDL statements: CREATE ROLE GRANT ROLE REVOKE ROLE GRANT privilege_type REVOKE privilege_type DROP ROLE SHOW ROLE GRANT SHOW GRANT For 

[Hive - LanguageManual] Create/Drop/Alter View Create/Drop/Alter Index Create/Drop Function

Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version information Icon View support is only available in Hive 0.6 and later. Create View CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_com