[HIve - LanguageManual] XPathUDF

Documentation for Built-In User-Defined Functions Related To XPath

UDFs

xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string

  • Functions for parsing XML data using XPath expressions.
  • Since version: 0.6.0

    Overview

The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.

All functions follow the form: xpath_*(xml_string, xpath_expression_string). The XPath expression string is compiled and cached. It is reused if the expression in the next input row matches the previous. Otherwise, it is recompiled. So, the xml string is always parsed for every input row, but the xpath expression is precompiled and reused for the vast majority of use cases.

Backward axes are supported. For example:

> select xpath (‘<a><b id="1"><c/></b><b id="2"><c/></b></a>‘,‘/descendant::c/ancestor::b/@id‘) from t1 limit 1 ;

[1","2]

Each function returns a specific Hive type given the XPath expression:

  • xpath returns a Hive array of strings.
  • xpath_string returns a string.
  • xpath_boolean returns a boolean.
  • xpath_short returns a short integer.
  • xpath_int returns an integer.
  • xpath_long returns a long integer.
  • xpath_float returns a floating point number.
  • xpath_double,xpath_number returns a double-precision floating point number (xpath_number is an alias for xpath_double).

The UDFs are schema agnostic - no XML validation is performed. However, malformed xml (e.g., <a><b>1</b></aa>) will result in a runtime exception being thrown.

Following are specifics on each xpath UDF variant.

xpath

The xpath() function always returns a hive array of strings. If the expression results in a non-text value (e.g., another xml node) the function will return an empty array. There are 2 primary uses for this function: to get a list of node text values or to get a list of attribute values.

Examples:

Non-matching XPath expression:

> select xpath(‘<a><b>b1</b><b>b2</b></a>‘,‘a/*‘) from src limit 1 ;

[]

Get a list of node text values:

> select xpath(‘<a><b>b1</b><b>b2</b></a>‘,‘a/*/text()‘) from src limit 1 ;

[b1","b2]

Get a list of values for attribute ‘id‘:

> select xpath(‘<a><b id="foo">b1</b><b id="bar">b2</b></a>‘,‘//@id‘) from src limit 1 ;

[foo","bar]

Get a list of node texts for nodes where the ‘class‘ attribute equals ‘bb‘:

> SELECT xpath (‘<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>‘‘a/*[@class="bb"]/text()‘) FROM src LIMIT 1 ;

[b1","c1]

xpath_string

The xpath_string() function returns the text of the first matching node.

Get the text for node ‘a/b‘:

> SELECT xpath_string (‘<a><b>bb</b><c>cc</c></a>‘‘a/b‘) FROM src LIMIT 1 ;

bb

Get the text for node ‘a‘. Because ‘a‘ has children nodes with text, the result is a composite of text from the children.

> SELECT xpath_string (‘<a><b>bb</b><c>cc</c></a>‘‘a‘) FROM src LIMIT 1 ;

bbcc

Non-matching expression returns an empty string:

> SELECT xpath_string (‘<a><b>bb</b><c>cc</c></a>‘‘a/d‘) FROM src LIMIT 1 ;

Gets the text of the first node that matches ‘//b‘:

> SELECT xpath_string (‘<a><b>b1</b><b>b2</b></a>‘‘//b‘) FROM src LIMIT 1 ;

b1

Gets the second matching node:

> SELECT xpath_string (‘<a><b>b1</b><b>b2</b></a>‘‘a/b[2]‘) FROM src LIMIT 1 ;

b2

Gets the text from the first node that has an attribute ‘id‘ with value ‘b_2‘:

> SELECT xpath_string (‘<a><b>b1</b><b id="b_2">b2</b></a>‘‘a/b[@id="b_2"]‘) FROM src LIMIT 1 ;

b2

xpath_boolean

Returns true if the XPath expression evaluates to true, or if a matching node is found.

Match found:

> SELECT xpath_boolean (‘<a><b>b</b></a>‘‘a/b‘) FROM src LIMIT 1 ;

true

No match found:

> SELECT xpath_boolean (‘<a><b>b</b></a>‘‘a/c‘) FROM src LIMIT 1 ;

false

Match found:

> SELECT xpath_boolean (‘<a><b>b</b></a>‘‘a/b = "b"‘) FROM src LIMIT 1 ;

true

No match found:

> SELECT xpath_boolean (‘<a><b>10</b></a>‘‘a/b < 10‘) FROM src LIMIT 1 ;

false

xpath_short, xpath_int, xpath_long

These functions return an integer numeric value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Mathematical operations are supported. In cases where the value overflows the return type, then the maximum value for the type is returned.

No match:

> SELECT xpath_int (‘<a>b</a>‘‘a = 10‘) FROM src LIMIT 1 ;

0

Non-numeric match:

> SELECT xpath_int (‘<a>this is not a number</a>‘‘a‘) FROM src LIMIT 1 ;

0

> SELECT xpath_int (‘<a>this 2 is not a number</a>‘‘a‘) FROM src LIMIT 1 ;

0

Adding values:

> SELECT xpath_int (‘<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>‘‘sum(a/*)‘) FROM src LIMIT 1 ;

15

> SELECT xpath_int (‘<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>‘‘sum(a/b)‘) FROM src LIMIT 1 ;

7

> SELECT xpath_int (‘<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>‘‘sum(a/b[@class="odd"])‘) FROM src LIMIT 1 ;

5

Overflow:

> SELECT xpath_int (‘<a><b>2000000000</b><c>40000000000</c></a>‘‘a/b * a/c‘) FROM src LIMIT 1 ;

2147483647

xpath_float, xpath_double, xpath_number

Similar to xpath_short, xpath_int and xpath_long but with floating point semantics. Non-matches result in zero. However,
non-numeric matches result in NaN. Note that xpath_number() is an alias for xpath_double().

No match:

> SELECT xpath_double (‘<a>b</a>‘‘a = 10‘) FROM src LIMIT 1 ;

0.0

Non-numeric match:

> SELECT xpath_double (‘<a>this is not a number</a>‘‘a‘) FROM src LIMIT 1 ;

NaN

A very large number:

SELECT xpath_double (‘<a><b>2000000000</b><c>40000000000</c></a>‘‘a/b * a/c‘) FROM src LIMIT 1 ;

8.0E19

时间: 2024-10-07 14:39:09

[HIve - LanguageManual] XPathUDF的相关文章

[Hive - LanguageManual] Alter Table/Partition/Column

Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add SerDe Properties Alter Table Storage Properties Additional Alter Table Statements Alter Partition Add Partitions Dynamic Partitions Rename Partition

[Hive - LanguageManual] Hive Data Manipulation Language

LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Synopsis Notes Inserting data into Hive Tables from queries Syntax Synopsis Notes Dynamic Partition Inserts Example Additional Documen

[Hive - LanguageManual] Import/Export

LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Lefty Leverenz on May 14, 2013  (view change) show comment Go to start of metadata Import/Export Import/Export Overview Export Syntax Import Syntax Examples V

[Hive - LanguageManual ] Windowing and Analytics Functions (待)

LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edited by Lefty Leverenz on Aug 01, 2014  (view change) show comment Go to start of metadata Windowing and Analytics Functions Windowing and Analytics Function

[Hive - LanguageManual] Create/Drop/Alter Database Create/Drop/Truncate Table

Hive Data Definition Language Hive Data Definition Language Overview Create/Drop/Alter Database Create/Drop/Truncate Table Alter Table/Partition/Column Create/Drop/Alter View Create/Drop/Alter Index Create/Drop Function Create/Drop/Grant/Revoke Roles

[Hive - LanguageManual] Create/Drop/Grant/Revoke Roles and Privileges / Show Use

Create/Drop/Grant/Revoke Roles and Privileges Hive Default Authorization - Legacy Mode has information about these DDL statements: CREATE ROLE GRANT ROLE REVOKE ROLE GRANT privilege_type REVOKE privilege_type DROP ROLE SHOW ROLE GRANT SHOW GRANT For 

[Hive - LanguageManual] Create/Drop/Alter View Create/Drop/Alter Index Create/Drop Function

Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version information Icon View support is only available in Hive 0.6 and later. Create View CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_com

[Hive - LanguageManual] Describe

Describe Describe Database Describe Table/View/Column Display Column Statistics Describe Partition Describe Database Version information Icon As of Hive 0.7. DESCRIBE DATABASE [EXTENDED] db_name; DESCRIBE SCHEMA [EXTENDED] db_name;     -- (Note: Hive

[Hive - LanguageManual] Select base use

Select Syntax WHERE Clause ALL and DISTINCT Clauses Partition Based Queries HAVING Clause LIMIT Clause REGEX Column Specification More Select Syntax GROUP BY SORT BY, ORDER BY, CLUSTER BY, DISTRIBUTE BY JOIN UNION ALL TABLESAMPLE Subqueries Virtual C