来源:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
Complex Type
Constructors
The following functions construct instances of complex types.
Constructor Function |
Operands |
Description |
---|---|---|
map |
(key1, value1, key2, value2, ...) |
Creates a map with the given key/value pairs |
struct |
(val1, val2, val3, ...) |
Creates a struct with the given field values. Struct field names will |
named_struct |
(name1, val1, name2, val2, ...) |
Creates a struct with the given field names and values. (as of Hive 0.8.0) |
array |
(val1, val2, ...) |
Creates an array with the given elements |
create_union |
(tag, val1, val2, ...) |
Creates a union type with the value that is being pointed to by the tag |
Date Functions
The following built-in date functions are supported in hive:
Return Type |
Name(Signature) |
Description |
---|---|---|
string |
from_unixtime(bigint unixtime[, string format]) |
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 |
bigint |
unix_timestamp() |
Gets current Unix timestamp in seconds |
bigint |
unix_timestamp(string date) |
Converts time string in format |
bigint |
unix_timestamp(string date, string pattern) |
Convert time string with given pattern (see [http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html]) |
string |
to_date(string timestamp) |
Returns the date part of a timestamp string: to_date("1970-01-01 |
int |
year(string date) |
Returns the year part of a date or a timestamp string: year("1970-01-01 |
int |
month(string date) |
Returns the month part of a date or a timestamp string: |
int |
day(string date) dayofmonth(date) |
Return the day part of a date or a timestamp string: day("1970-11-01 |
int |
hour(string date) |
Returns the hour of the timestamp: hour(‘2009-07-30 12:58:59‘) = 12, |
int |
minute(string date) |
Returns the minute of the timestamp |
int |
second(string date) |
Returns the second of the timestamp |
int |
weekofyear(string date) |
Return the week number of a timestamp string: weekofyear("1970-11-01 |
int |
datediff(string enddate, string startdate) |
Return the number of days from startdate to enddate: |
string |
date_add(string startdate, int days) |
Add a number of days to startdate: date_add(‘2008-12-31‘, 1) = |
string |
date_sub(string startdate, int days) |
Subtract a number of days to startdate: date_sub(‘2008-12-31‘, 1) = |
timestamp |
from_utc_timestamp(timestamp, string timezone) |
Assumes given timestamp is UTC and converts to given timezone (as of |
timestamp |
to_utc_timestamp(timestamp, string timezone) |
Assumes given timestamp is in given timezone and converts to UTC (as of |
String Functions
The following built-in String functions are supported in hive:
Return Type |
Name(Signature) |
Description |
---|---|---|
int |
ascii(string str) |
Returns the numeric value of the first character of str |
string |
base64(binary bin) |
Convert the argument from binary to a base 64 string (as of Hive 0.12.0) |
string |
concat(string|binary A, string|binary B...) |
Returns the string or bytes resulting from concatenating the strings or |
array<struct<string,double>> |
context_ngrams(array<array<string>>, array<string>, |
Returns the top-k contextual N-grams from a set of tokenized sentences, |
string |
concat_ws(string SEP, string A, string B...) |
Like concat() above, but with custom separator SEP. |
string |
concat_ws(string SEP, array<string>) |
Like concat_ws() above, but taking an array of strings. (as of Hive 0.9.0) |
string |
decode(binary bin, string charset) |
Decode the first argument into a String using the provided character |
binary |
encode(string src, string charset) |
Encode the first argument into a BINARY using the provided character |
int |
find_in_set(string str, string strList) |
Returns the first occurance of str in strList where strList is a |
string |
format_number(number x, int d) |
Formats the number X to a format like ‘#,###,###.##‘, rounded to D |
string |
get_json_object(string json_string, string path) |
Extract json object from a json string based on json path specified, |
boolean |
in_file(string str, string filename) |
Returns true if the string str appears as an entire line in |
int |
instr(string str, string substr) |
Returns the position of the first occurrence of |
int |
length(string A) |
Returns the length of the string |
int |
locate(string substr, string str[, int pos]) |
Returns the position of the first occurrence of substr in str after |
string |
lower(string A) lcase(string A) |
Returns the string resulting from converting all characters of B to |
string |
lpad(string str, int len, string pad) |
Returns str, left-padded with pad to a length of len |
string |
ltrim(string A) |
Returns the string resulting from trimming spaces from the |
array<struct<string,double>> |
ngrams(array<array<string>>, int N, int K, int pf) |
Returns the top-k N-grams from a set of tokenized sentences, such as |
string |
parse_url(string urlString, string partToExtract [, string |
Returns the specified part from the URL. Valid values for partToExtract |
string |
printf(String format, Obj... args) |
Returns the input formatted according do printf-style format strings |
string |
regexp_extract(string subject, string pattern, int index) |
Returns the string extracted using the pattern. e.g. |
string |
regexp_replace(string INITIAL_STRING, string PATTERN, string |
Returns the string resulting from replacing all substrings in |
string |
repeat(string str, int n) |
Repeat str n times |
string |
reverse(string A) |
Returns the reversed string |
string |
rpad(string str, int len, string pad) |
Returns str, right-padded with pad to a length of len |
string |
rtrim(string A) |
Returns the string resulting from trimming spaces from the end(right |
array<array<string>> |
sentences(string str, string lang, string locale) |
Tokenizes a string of natural language text into words and sentences, |
string |
space(int n) |
Return a string of n spaces |
array |
split(string str, string pat) |
Split str around pat (pat is a regular expression) |
map<string,string> |
str_to_map(text[, delimiter1, delimiter2]) |
Splits text into key-value pairs using two delimiters. Delimiter1 |
string |
substr(string|binary A, int start) substring(string|binary A, int |
Returns the substring or slice of the byte array of A starting from |
string |
substr(string|binary A, int start, int len) substring(string|binary A, |
Returns the substring or slice of the byte array of A starting from |
string |
translate(string input, string from, string to) |
Translates the input string by replacing the characters present in the |
string |
trim(string A) |
Returns the string resulting from trimming spaces from both ends of A |
binary |
unbase64(string str) |
Convert the argument from a base 64 string to BINARY (as of Hive 0.12.0) |
string |
upper(string A) ucase(string A) |
Returns the string resulting from converting all characters of A to |