简介
Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a Spark cluster and for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program that can be shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data will be shared among all of them.
Spark SQL’s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the “Thrift server” since it uses the Thrift communication protocol. Note that the JDBC server requires Spark be built with Hive support
运行环境
集群环境:CDH5.3.0
具体JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
启动 JDBC server
cd /etc/spark/conf ln -s /etc/hive/conf/hive-site.xml hive-site.xml cd /opt/cloudera/parcels/CDH/lib/spark/ chmod- -R 777 logs/ cd /opt/cloudera/parcels/CDH/lib/spark/sbin ./start-thriftserver.sh --master yarn
Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin beeline -u jdbc:hive2://hadoop04:10000 [[email protected] bin]# beeline -u jdbc:hive2://hadoop04:10000 scan complete in 2ms Connecting to jdbc:hive2://hadoop04:10000 Connected to: Spark SQL (version 1.2.0) Driver: Hive JDBC (version 0.13.1-cdh5.3.0) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 0.13.1-cdh5.3.0 by Apache Hive 0: jdbc:hive2://hadoop04:10000>
时间: 2024-10-07 21:00:02