Hive_进阶

回顾:

hive 优点
1. 类sql语句靠近关系型数据库,可自定义函数,增加了扩展性,易于开发,减少mapreduce学习成本
2. hive转换sql语句为mapreduce程序以mapreduce为底层实现
3. hive基于hadoop的hdfs,在hdfs上存储,因为hdfs的扩展性,hive的存储扩展性相应增加

hive 安装部署

1. 解压安装包
2. 进入conf目录,拷贝(备份)相应配置文件,修改
hive-env.sh
--> HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6
--> export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf
hive-log4j.properties
--> 在hive根目录下创建日志文件夹,来存放hive运行日志
--> hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs
hive.log.file=hive.log
hive-site.xml
--> javax.jdo.option.ConnectionURL --- jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true
--> javax.jdo.option.ConnectionDriverName --- com.mysql.jdbc.Driver
--> javax.jdo.option.ConnectionUserName --- root
--> javax.jdo.option.ConnectionPassword --- root
--> hive.cli.print.header --- true # 这行表示是否显示标的列名(可不配)
--> hive.cli.print.current.db --- true # 这行表示是否显示当前数据库名(可不配)
--> hive.fetch.task.conversion --- true # 这行表示运行sql语句是否走mr(可不配)

hive 架构

1. metastore
--> 在derby数据库存储,在hive目录中会生成derby文件和metastore_db,弊端同级目录下启动hive会报错
--> 在mysql中存储元数据
--> 在远程mysql存储元数据
2. client
--> cli/jdbc/Driver/SQLParser/QueryOptimizer/Physical Plan/Execution 

hive 创建表几种方式,分别是什么

1. 普通建表
create table if not exists tablename(...)
row format delimited fields terminated by ‘\t‘;
stored as textfile;
2. 子查询方式
create table if not exists tablename as select * from tablename2;
3. like 方式
create table if not exists tablename like tablename2 # 该创建方式仅复制tablename2的表结构

表的类型

1. 管理表(默认表类型)
2. 外部表(external;解决多用户使用同一表)
3. 分区表(partiion;优化分析查询表数据)
--> 查看分区表: show partitions tablename; # 查看tablename的分区情况
手动添加分区
1. 创建分区
hive (workdb)> dfs -mkdir /user/hive/warehouse/workdb.db/emp_part/date=20161029
dfs -put /home/liuwl/opt/datas/emp.txt /user/hive/warehouse/workdb.db/emp_part/date=20161029
--- 发现使用show partitions emp_parts; 不能检索出刚刚手动添加的表分区
2. 解决:alter tabel emp_part add partition (date=‘20161029‘)

分析函数和窗口函数(重点)

1. 分析函数
部门20的所有员工,按薪资降序排列
select * from emp where emp.deptno=‘20‘ order by sal desc;
所有部门分组,按薪资降序排列
select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as maxsal from emp;
结果:
empno	ename	deptno sal maxsal
7839	KING	10	5000.0	5000.0
7782	CLARK	10	2450.0	5000.0
7934	MILLER	10	1300.0	5000.0
7788	SCOTT	20	3000.0	3000.0
7902	FORD	20	3000.0	3000.0
7566	JONES	20	2975.0	3000.0
7876	ADAMS	20	1100.0	3000.0
7369	SMITH	20	800.0	3000.0
7698	BLAKE	30	2850.0	2850.0
7499	ALLEN	30	1600.0	2850.0
7844	TURNER	30	1500.0	2850.0
7654	MARTIN	30	1250.0	2850.0
7521	WARD	30	1250.0	2850.0
7900	JAMES	30	950.0	2850.0
实现行号
select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp;
empno	ename	deptno	sal	rownum
7839	KING	10	5000.0	1
7782	CLARK	10	2450.0	2
7934	MILLER	10	1300.0	3
7788	SCOTT	20	3000.0	1
7902	FORD	20	3000.0	2
7566	JONES	20	2975.0	3
7876	ADAMS	20	1100.0	4
7369	SMITH	20	800.0	5
7698	BLAKE	30	2850.0	1
7499	ALLEN	30	1600.0	2
7844	TURNER	30	1500.0	3
7654	MARTIN	30	1250.0	4
7521	WARD	30	1250.0	5
7900	JAMES	30	950.0	6
ROW_NUMBER() 行号
获取工资最高的前两位
select * from (select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp) t where t.rownum < 3;
t.empno	t.ename	t.deptno	t.sal	t.rownum
7839	KING	10	5000.0	1
7782	CLARK	10	2450.0	2
7788	SCOTT	20	3000.0	1
7902	FORD	20	3000.0	2
7698	BLAKE	30	2850.0	1
7499	ALLEN	30	1600.0	2
RANK()排名(第一二位为1,第三位为3,默认第二位为1)
DENSE_RANK()(第一二位为1,第三位为2,默认第二位为1)
为所有部门分组,按薪资降序排列,且进行排名
select empno,ename,deptno,sal,rank() over (partition by deptno order by sal desc) ranksal from emp;
empno	ename	deptno	sal	ranksal
7839	KING	10	5000.0	1
7782	CLARK	10	2450.0	2
7934	MILLER 10	1300.0	3
7788	SCOTT	20	3000.0	1
7902	FORD	20	3000.0	1
7566	JONES	20	2975.0	3
7876	ADAMS	20	1100.0	4
7369	SMITH	20	800.0	5
7698	BLAKE	30	2850.0	1
7499	ALLEN	30	1600.0	2
7844	TURNER 30	1500.0	3
7654	MARTIN 30	1250.0	4
7521	WARD	30	1250.0	4
7900	JAMES	30	950.0	6
select empno,ename,deptno,sal,dense_rank() over (partition by deptno order by sal desc) ranksal from emp;
empno	ename	deptno	sal	dense_ranksal
7839	KING	10	5000.0	1
7782	CLARK	10	2450.0	2
7934	MILLER	10	1300.0	3
7788	SCOTT	20	3000.0	1
7902	FORD	20	3000.0	1
7566	JONES	20	2975.0	2
7876	ADAMS	20	1100.0	3
7369	SMITH	20	800.0	4
7698	BLAKE	30	2850.0	1
7499	ALLEN	30	1600.0	2
7844	TURNER	30	1500.0	3
7654	MARTIN	30	1250.0	4
7521	WARD	30	1250.0	4
7900	JAMES	30	950.0	5
NTILE()层次查询
例:查询出所有组中工资水平前1/3人员
select empno,ename,sal,ntile(3) over (order by sal desc ) ntile from emp group by empno,ename;
empno	ename	sal	til
7839	KING	5000.0	1
7902	FORD	3000.0	1
7788	SCOTT	3000.0	1
7566	JONES	2975.0	1
7698	BLAKE	2850.0	1
7782	CLARK	2450.0	2
7499	ALLEN	1600.0	2
7844	TURNER 1500.0	2
7934	MILLER 1300.0	2
7654	MARTIN 1250.0	2
7521	WARD	1250.0	3
7876	ADAMS	1100.0	3
7900	JAMES	950.0	3
7369	SMITH	800.0	3
2. 窗口函数LAG(向前取值)LEAG(向后取值)
select empno,ename,sal,lag(ename,4,0) over (order by sal desc) lagvalue from emp;
结果:
empno	ename	sal	lagvalue
7839	KING	5000.0	0
7902	FORD	3000.0	0
7788	SCOTT	3000.0	0
7566	JONES	2975.0	0
7698	BLAKE	2850.0	KING
7782	CLARK	2450.0	FORD
7499	ALLEN	1600.0	SCOTT
7844	TURNER	1500.0	JONES
7934	MILLER	1300.0	BLAKE
7654	MARTIN	1250.0	CLARK
7521	WARD	1250.0	ALLEN
7876	ADAMS	1100.0	TURNER
7900	JAMES	950.0	MILLER
7369	SMITH	800.0	MARTIN
select empno,ename,sal,lead(ename,4,0) over (order by sal desc) leadvalue from emp;
结果:
empno	ename	sal	leadvalue
7839	KING	5000.0	BLAKE
7902	FORD	3000.0	CLARK
7788	SCOTT	3000.0	ALLEN
7566	JONES	2975.0	TURNER
7698	BLAKE	2850.0	MILLER
7782	CLARK	2450.0	MARTIN
7499	ALLEN	1600.0	WARD
7844	TURNER	1500.0	ADAMS
7934	MILLER	1300.0	JAMES
7654	MARTIN	1250.0	SMITH
7521	WARD	1250.0	0
7876	ADAMS	1100.0	0
7900	JAMES	950.0	0
7369	SMITH	800.0	0

数据导入Hive(重点)

1. 从本地导入
load data local inpath ‘filepath‘ into table tbname;
2. 从hdfs导入
load data inpath ‘hdfs_filepath‘ into table tbname;
3. load覆盖
load data local inpath ‘filepath‘ overwrite into table tbname;
load data inpath ‘hdfs_filepath‘ overwrite into table tbname;
4. 子查询方式
create table tb2 as select * from tb1; # 默认分隔符为^A
5. insert into table select ql; # 分隔符为emp定义的分隔符
--> create table emp_insert like emp;
--> insert into table emp_insert select * from emp;
6. location方式
create table if not exists tbname location ‘localPath‘;

Hive数据导出(重点)

1. insert方式(注意使用该方式导出数据到本地,1:文件夹得有相应权限;2:最好建一个文件夹,否则原文件夹下所有内容被覆盖)
--> insert overwrite [local] directory ‘path‘ select ql ;
例:insert overwrite local directory ‘/tmp‘ row format delimited fields terminated by ‘\t‘ select * from emp;
2. bin/hdfs -get
3. Linux命令执行HQL:
-> -e
-> -f
-> 输出重定向
4. sqoop:用户hdfs与关系型数据库之间的导入导出

hive的export与import(相关地址只能是hdfsPath)

-> export
export table tb_name to ‘hdfs_path‘
例: export table emp to ‘/export_emp‘;
-> import
import table tb_name from ‘hdfs_path‘
例: import table emp_im from ‘/export_emp‘; 

hive中的HQL

1. 字段查询
-> select empno,ename from emp;
2. where、limit、distinct
-> select * from emp where sal > 3000;
-> select * from emp limit 5;
-> select distinct deptno from emp;
3. between and,>,<,=,is null,is not null,in
-> select * from emp where sal between 2000 and 3000;
-> select * from emp where comm is null;
-> select * from emp where sal in (2000,3000,4000);
4. count(),sum(),avg(),max(),min()
-> select count(1) from emp;
-> select sum(sal) form emp;
-> select avg(sal) from emp;
5. group by,having
-> select deptno,avg(sal) from emp group by deptno;
-> select deptno,avg(sal) avgsal from emp group by deptno having avgsal >= 3000;
6. join
-> 等值join(匹配共有的记录)
select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e join dept d on e.deptno=d.deptno;
-> 左join(左边为小表)
select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e left join dept d on e.deptno=d.deptno;
-> 右join(右边为小表)
select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e right join dept d on e.deptno=d.deptno;
-> 全join
select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e full join dept d on e.deptno=d.deptno;

hive中mapreduce相关操作(重点)

1. 设置每个reduce处理的数据量
-> set hive.exec.reducers.bytes.per.reducer;
默认1G,当处理的数据量为10G时,将开启10个reduce,每个reduce处理1G
2. 设置最大运行reduce的个数
-> set hive.exec.reducers.max;
默认最大运行个数为999
3. 设置实际reduce的个数
-> set mapreduce.job.reduces;
hive打印显示-1,hadoop默认为1

hive中的几种排序(重点)

1. order by (只针对一个文件排序,当有多个reduce Task生成多个文件时,排序失效)
-> select * from emp order by sal desc;
2. sort by (对于每一个文件进行排序,注意目必须是绝对路径)
-> set mapreduce.job.reduces=3;
-> insert overwrite local directory ‘/home/liuwl/opt/datas/sortData‘ row format delimited fields terminated by ‘\t‘ select * from emp sort by sal;
需要注意的是如果使用order by 不管reduce Task设置多少,只生成一个文件,并为该文件排序
3. distribute by (底层为mapreduce的分区,一般与sort by 连用)
-> insert overwrite local directory ‘/home/liuwl/opt/datas/sortData‘ row format delimited fields terminated by ‘\t‘ select * from emp distribute by deptno sort by sal;
4. cluster by (等价于distribute by xx sort by xx) # xx为同一字段
-> insert overwrite local directory ‘/home/liuwl/opt/datas/sortData‘ row format delimited fields terminated by ‘\t‘ select * from emp cluster by sal ;

hive中的UDF(自定义函数,允许用户扩展HiveOL功能)(重点)

1. udf: 一进一出 upper/lower/day
2. udaf: 多进一出 count/max/min
3. udtf: 一进多出 ateral/view/explode
udf:编程步骤
--> 继承 org.apache.hadoop.hive.ql.UDF
--> 实现 evaluate函数,evaluate函数支持重载
注意:UDF必须要有返回类型,可以返回NULL,但是返回类型不能为void
UDF中常用Text/LongWritable等类型,不推荐使用java
-->代码
public Text evaluate(Text str){
  return this.evaluate(str,new IntWritable(0));
}
public Text evaluate(Text str, IntWritable flag){

  if(str != null){
    if(flag.get() == 0){
      return new Text(str.toString().toLowerCase());
    }else if(flag.get() ==1){
      return new Text(str.toString().toUpperCase());
    }else return null;
  }else return null;
}
--> 打包
--> 在hive中添加
---> 关联jar包
add jar jar_path;
---> 创建方法
create temporary function tolower as ‘com.hive.udf.UDFTest‘
---> 测试
select ename,tolower(ename),tolower(tolower(ename),1) lowername from emp;
ename	lowername  uppername
SMITH	smith    SMITH
ALLEN	allen    ALLEN
WARD	ward     WARD
JONES	jones    JONES
MARTIN	martin    MARTIN
BLAKE	blake    BLAKE
CLARK	clark    CLARK
SCOTT	scott    SCOTT
KING	king     KING
TURNER	turner    TURNER
ADAMS	adams    ADAMS
JAMES	james    JAMES
FORD	ford     FORD
MILLER	miller    MILLER
-->案例2:去除所有双引号
-->代码
public Text evaluate(Text str){
if(str != null){
return new Text(str.toString().replaceAll("\"", ""));
}else return null;
}
--> 关联jar包
add jar jar_path;
--> 创建方法
create temporary function rmquotes as ‘com.hive.udf.RMQuotes‘
--> 测试
select dname,rmquotes(dname) rmquotes from dept_quotes;
dname	rmquotes
"ACCOUNTING"	ACCOUNTING
"RESEARCH"	RESEARCH
"SALES"	SALES
"OPERATIONS"	OPERATIONS
-->案例3:去除多有引号,并取出get后的路径
如:"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	/theme/styles.php/bootstrap/1427679483/all
-->准备数据:
moodel.log
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/image.php/bootstrap/theme/1427679483/fp/logo HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/t/expanded HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/image.php/bootstrap/theme/1427679483/fp/search_btn HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/t/collapsed HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/image.php/bootstrap/theme/1427679483/fp/footerbg HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:42 +0800"	"GET /theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/t/block_to_dock HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/t/switch_plus HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:43 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/t/switch_minus HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:45 +0800"	"GET /course/view.php?id=27&section=4 HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:46 +0800"	"GET /theme/image.php/bootstrap/page/1427679483/icon HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:46 +0800"	"GET /theme/image.php/bootstrap/core/1427679483/spacer HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:46 +0800"	"GET /theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js HTTP/1.1"
"116.216.17.0"	"31/Aug/2015:00:19:54 +0800"	"GET /mod/page/view.php?id=11187&section=4 HTTP/1.1"
-->创建moodle表
create table if not exists moodle(
ip string,
date string,
url string) row format delimited fields terminated by ‘\t‘;
-->加载数据
load data local inpath ‘/home/liuwl/opt/datas/dd/moodle.log‘ into table moodle;
-->代码
public Text evaluate(Text text){
  if(text != null){
    String strs = text.toString().replaceAll("\"", "");
    String str = "";
    boolean isDate = false;
    try{
      SimpleDateFormat sdf = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", new Locale("ENGLISH", "CHINA"));
      SimpleDateFormat sdf1 = new SimpleDateFormat("yyyyMMddHHmm",new Locale("CHINESE", "CHINA"));
      Date date = sdf.parse(strs);
      str=sdf1.format(date);
      isDate = true;
      }catch(ParseException p){
      isDate = false;
      }
    // is date
    if(isDate){
      return new Text(str);
    }else{
      if(strs.indexOf("HTTP/1.1")>0){
        return new Text(strs.split(" ")[1]);
      }else{
        return new Text(strs.split(" ")[0]);
      }
    }
  }else return null;
}
--> 关联jar包
add jar jar_path;
--> 创建方法
create temporary function mymoodle as ‘com.hive.udf.mymoodle‘;
--> 测试
select mymoodle(ip) ip,mymoodle(date) date,mymoodle(url) url from moodle;
ip	        date	        url
116.216.17.0	201508310019	/theme/styles.php/bootstrap/1427679483/all
116.216.17.0	201508310019	/theme/image.php/bootstrap/theme/1427679483/fp/logo
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/t/expanded
116.216.17.0	201508310019	/theme/image.php/bootstrap/theme/1427679483/fp/search_btn
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/t/collapsed
116.216.17.0	201508310019	/theme/image.php/bootstrap/theme/1427679483/fp/footerbg
116.216.17.0	201508310019	/theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js
116.216.17.0	201508310019	/theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js
116.216.17.0	201508310019	/theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js
116.216.17.0	201508310019	/theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css
116.216.17.0	201508310019	/theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/t/block_to_dock
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/t/switch_plus
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/t/switch_minus
116.216.17.0	201508310019	/course/view.php?id=27&section=4
116.216.17.0	201508310019	/theme/image.php/bootstrap/page/1427679483/icon
116.216.17.0	201508310019	/theme/image.php/bootstrap/core/1427679483/spacer
116.216.17.0	201508310019	/theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js
116.216.17.0	201508310019	/mod/page/view.php?id=11187&section=4

hive中hiveserver2,beeline,java client

-->hiveserver2启动方式:bin/hiveserver2;bin/hiveserver2;bin/hive --service hiveserver2
-->beeline启动方式:bin/beeline -u jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb -n liuwl -p liuwl
--> bin/beeline
--> !connect jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb
-->java client
-->启动hiveserver2
-->编写代码:
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
private static String url = "jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb";
private static String username = "root";
private static String password = "root";

public static Connection getConnection(){
  try {
    Class.forName(driverName);
    Connection con = DriverManager.getConnection(url, username, password);
    return con;
  } catch (ClassNotFoundException e) {
    e.printStackTrace();
    System.exit(1);
  } catch (SQLException e) {
    e.printStackTrace();
  }
  return null;
}

public static List<Object> querySql(Statement stmt, String sql) throws SQLException{

  ResultSet res = stmt.executeQuery(sql);
  List<Object> objectList = new ArrayList<Object>();
  while (res.next()) {
    objectList.add(res.getString(1));
  }
  return objectList;
}

public static void main(String[] args) throws SQLException {
  Connection con = getConnection();
  Statement stmt = con.createStatement();
  List<Object> objectList = new ArrayList<Object>();
  //query
  String qsql = "select * from emp";
  objectList = querySql(stmt,qsql);
  for(int i = 0; i < objectList.size(); i++){
    System.out.println(objectList.get(i));
  }
  //regular
  String rsql = "select count(1) from emp";
  objectList = querySql(stmt,rsql);
  for(int i = 0; i < objectList.size(); i++){
    System.out.println(objectList.get(i));
  }
  //create
  String csql = "create table if not exists test (key int, value string) row format delimited fields terminated by ‘\t‘";
  stmt.execute(csql);
  //load
  //String lsql = "load data local inpath ‘/home/liuwl/opt/datas/test.txt‘ into table test";
  //update start as 0.14
  //String usql = "update table test set key =4 where value=‘uuuu‘";
  //stmt.executeUpdate(usql);
  //drop
  String dsql = "drop table if exists test";
  if(!stmt.execute(dsql)){
    System.out.println("success");
  }
}

hive的执行sql两种模式(Fetch Task与mapreduce)

--> hive.fetch.task.conversion--minimal # SELECT STAR, FILTER on partition columns, LIMIT only 译:在select *;使用分区作为过滤条件;limit语句
hive.fetch.task.conversion--more # SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns) 译: 在所有select语句,数据取样, 虚拟列,
--> 虚拟列(注意是双下划线)
--> input__file__name # 数据的来源
--> block__offset_inside__file # 记录在块中的偏移量
--> row__offset__inside__block # 行的偏移量(默认不启用,需要修改hive.exec.rowoffset)

hive的严格模式

--> hive.mapred.mode--(默认nonstrict)
注意: 在严格模式下
不允许由风险型sql语句:
笛卡尔积查询(使用join而不使用on或where)
分区表没有指定分区
order by没有使用limit
bigint与string/double的比较
时间: 2024-11-03 03:28:04

Hive_进阶的相关文章

Azure进阶攻略丨Azure网络通不通,PsPing&amp;PaPing告诉你答案

很多时候,为了解决一些问题,要查各种文档,很麻烦你造吗!做「伸手党」又容易被鄙视,这时候就需要用到[Azure 进阶攻略]啦!特此,我们推出关于 Azure 常用操作指南的系列文章,每篇涉及一个 Azure 使用过程中的问题,简明扼要直击重点,纯干货内容帮你快速搞定 Azure 使用难题和障碍,只需一两分钟阅读,你就可以继续愉快地翱翔云端~ 在技术人员和网络工程师的世界中,有一些众所周知的排错方式,例如: -你先 Ping 一下某某计算机,看它开着没, -我可以 Ping 通路由器,但 Ping

Java进阶书籍推荐

学习Java,书籍是必不可少的学习工具之一,尤其是对于自学者而言.废话不多说,下边就给广大程序猿们推荐一些Java进阶的好书. 第一部分:Java语言篇 1.<Java编程规范> 适合对象:初级.中级 介绍:这本书的作者是被誉为Java之父的James Gosling,入门者推荐阅读,对基础的讲解很不错. 2.<Java编程思想> 适合对象:初级.中级 介绍:豆瓣给出了9.1的评分,全球程序员广泛赞誉.有人说这本书不适合初学者,不过小编认为作者并没有对读者已有的知识经验有过多要求,

进阶之初探nodeJS

一.前言 在"初探nodeJS"随笔中,我们对于node有了一个大致地了解,并在最后也通过一个示例,了解了如何快速地开启一个简单的服务器. 今儿,再次看了该篇随笔,发现该随笔理论知识稍多,适合初级入门node,固萌生一个想法--想在该篇随笔中,通过一步步编写一个稍大一点的node示例,让我们在整体上更加全面地了解node. so,该篇随笔是建立在"初探nodeJS"之上的,固取名为"进阶之初探nodeJS". 好了,侃了这多,那么我们即将实现一个

机器学习进阶笔记之一 | TensorFlow安装与入门

原文链接:https://zhuanlan.zhihu.com/p/22410917 TensorFlow 是 Google 基于 DistBelief 进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域.其命名来源于本身的运行原理.Tensor(张量)意味着 N 维数组,Flow(流)意味着基于数据流图的计算,TensorFlow 代表着张量从图象的一端流动到另一端计算过程,是将复杂的数据结构传输至人工智能神经网中进行分析和处理的过程. -- 由 UCloud

Python之路【第十七篇】:Django【进阶篇 】

Python之路[第十七篇]:Django[进阶篇 ] Model 到目前为止,当我们的程序涉及到数据库相关操作时,我们一般都会这么搞: 创建数据库,设计表结构和字段 使用 MySQLdb 来连接数据库,并编写数据访问层代码 业务逻辑层去调用数据访问层执行数据库操作 import MySQLdb def GetList(sql): db = MySQLdb.connect(user='root', db='wupeiqidb', passwd='1234', host='localhost')

bash脚本编程之case语句及脚本选项进阶

case语句及脚本选项进阶详解     面向过程程序设计语言中的控制流(即程序当中的语句)默认是顺序执行的. 程序设计语言的控制结构一共有三类: 1,顺序结构 2,选择结构 (1)if语句 ->单分支的if语句 格式:if condition1;then expression ... fi ->双分支的if语句 格式:if condition1;then expression ... else expression ... fi ->多分支的if语句 格式:if condition1;t

Java进阶(三十四)Integer与int的种种比较你知道多少?

Java进阶(三十四)Integer与int的种种比较你知道多少? 前言 如果面试官问Integer与int的区别:估计大多数人只会说到两点:Ingeter是int的包装类,注意是一个类:int的初值为0,Ingeter的初值为null.但是如果面试官再问一下Integer i = 1;int ii = 1; i==ii为true还是为false?估计就有一部分人答不出来了,如果再问一下其他的,估计更多的人会头脑一片混乱.所以我对它们进行了总结,希望对大家有帮助. 首先看代码: package

grep进阶与sed行编辑器

grep进阶与sed行编辑器 上一篇(http://nearlv.blog.51cto.com/2432295/1729198)我们介绍了grep的一些基本的使用方法,其它grep还是一些比较高级的用法,让我们一起来看看. 先来看一下下面例子的运行结果: 相同的命令输出的结果有点不一样,对,这里就是利用"alias"去设置命令的别名,语法格式为"alias="command""即可,可能通过直接在命令行输入"alias"查看当

我的Android进阶之旅------&gt;Java字符串格式化方法String.format()格式化float型时小数点变成逗号问题

今天接到一个波兰的客户说有个APP在英文状态下一切运行正常,但是当系统语言切换到波兰语言的时候,程序奔溃了.好吧,又是我来维护. 好吧,先把系统语言切换到波兰语,切换到波兰语的方法查看文章 我的Android进阶之旅------>Android[设置]-[语言和输入法]-[语言]列表中找到相应语言所对应的列表项 地址:http://blog.csdn.net/ouyang_peng/article/details/50209789 ================================