CDH5.4.5运行Phoenix导入CSV文件

1.安装phoenix

在界面上设置Phoenix的parcel包:

http://52.11.56.155:7180/cmf/settings?groupKey=config.scm.parcel.display_group&groupParent=

添加一个Remote Parcel Repository URLs url:http://archive.cloudera.com/cloudera-labs/phoenix/parcels/1.0/

CM会自动发现新的parcel,然后点击Download,Distribute and Active。重启集群

2.进入到某台服务器上,查看phoenix的安装路径

[[email protected]172-31-25-243 ~]# cd /opt/cloudera/parcels/CLABS_PHOENIX
[[email protected]-172-31-25-243 phoenix]# ls
bin  dev  examples  lib  phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar  phoenix-4.3.0-clabs-phoenix-1.0.0-server.jar  phoenix-4.3.0-clabs-phoenix-1.0.0-server-without-antlr.jar

bin目录下为可执行文件,examples目录下为一些样例

3.导入CSV格式的表

CSV文件为/root/ceb/cis_cust_imp_info.csv,内容如下:

20131131,100010001001,BR01,2000.01
20131131,100010001002,BR01,2000.02
20131131,100010001003,BR02,2000.03

定义一个表结构的文件/root/ceb/cis_cust_imp_info.sql,内容如下,

CREATE TABLE IF NOT EXISTS cis_cust_imp_info(
statistics_dt varchar(50),
cust_id varchar(50),
open_org_id varchar(50),
assert9_bal decimal(18,2),
CONSTRAINT pk PRIMARY KEY (statistics_dt, cust_id)
); 

注意最后的分号是必须的。

运行命令,导入CSV

[[email protected]172-31-25-243 phoenix]# bin/psql.py 172.31.25.244 /root/ceb/cis_cust_imp_info.sql /root/ceb/cis_cust_imp_info.csv
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
15/09/04 10:26:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/04 10:27:00 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties
no rows upserted
Time: 0.259 sec(s)

csv columns from database.
CSV Upsert complete. 3 rows upserted
Time: 0.067 sec(s)

在hbase shell中进行验证:

hbase(main):001:0> list
TABLE
CIS_CUST_IMP_INFO
SYSTEM.CATALOG
SYSTEM.SEQUENCE
SYSTEM.STATS
4 row(s) in 0.2650 seconds

=> ["CIS_CUST_IMP_INFO", "SYSTEM.CATALOG", "SYSTEM.SEQUENCE", "SYSTEM.STATS"]
hbase(main):002:0> scan ‘CIS_CUST_IMP_INFO‘
ROW                                                COLUMN+CELL
 20131131\x00100010001001                          column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x02
 20131131\x00100010001001                          column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR01
 20131131\x00100010001001                          column=0:_0, timestamp=1441362422661, value=
 20131131\x00100010001002                          column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x03
 20131131\x00100010001002                          column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR01
 20131131\x00100010001002                          column=0:_0, timestamp=1441362422661, value=
 20131131\x00100010001003                          column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x04
 20131131\x00100010001003                          column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR02
 20131131\x00100010001003                          column=0:_0, timestamp=1441362422661, value=
3 row(s) in 0.1840 seconds

4.以MR的方式导入大量CSV文件

[[email protected]172-31-25-243 phoenix]# hadoop jar phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table cis_cust_imp_info --input /root/ceb/cis_cust_imp_info.csv --zookeeper 172.31.25.244

发生错误:

java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1795)
    at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1751)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1006)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
    at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:348)
    at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:309)
    at org.apache.phoenix.schema.MetaDataClient.getCurrentTime(MetaDataClient.java:293)
    at org.apache.phoenix.compile.StatementContext.getCurrentTime(StatementContext.java:253)
    at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:184)
    at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:154)
    at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:235)
    at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:226)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:225)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1039)
    at org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getColumns(PhoenixDatabaseMetaData.java:492)
    at org.apache.phoenix.util.CSVCommonsLoader.generateColumnInfo(CSVCommonsLoader.java:296)
    at org.apache.phoenix.mapreduce.CsvBulkLoadTool.buildImportColumns(CsvBulkLoadTool.java:291)
    at org.apache.phoenix.mapreduce.CsvBulkLoadTool.loadData(CsvBulkLoadTool.java:200)
    at org.apache.phoenix.mapreduce.CsvBulkLoadTool.run(CsvBulkLoadTool.java:186)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:97)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString

网上搜索,发现是由于HBASE的一个bug,解决方法是:

[[email protected]172-31-25-243 phoenix]# cd /opt/cloudera/parcels/CDH/lib/hadoop
ln -s /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hbase/lib/hbase-protocol-1.0.0-cdh5.4.5.jar hbase-protocol-1.0.0-cdh5.4.5.jar

重新运行导入命令,发现如下错误:

15/09/04 11:04:43 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
    at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)

原因是/user目录权限问题,用hdfs用户重新跑一遍,发生错误:

sudo -u hdfs hadoop jar phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table cis_cust_imp_info --input /root/ceb/cis_cust_imp_info.csv --zookeeper 172.31.25.244

15/09/04 11:06:05 ERROR mapreduce.CsvBulkLoadTool: Import job on table=CIS_CUST_IMP_INFO failed due to exception:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-25-243.us-west-2.compute.internal:8020/root/ceb/cis_cust_imp_info.csv
15/09/04 11:06:05 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f97b7df1400a4

原来用MR模式跑,文件需要放到HDFS上

时间: 2024-10-16 22:45:22

CDH5.4.5运行Phoenix导入CSV文件的相关文章

Phoenix 导出csv文件

1. 是否存在高效的csv导出工具? phoenix提供了BulkLoad工具,使得用户可以将大数据量的csv格式数据高效地通过phoenix导入hbase,那么phoenix是否也存在高效导出csv数据的工具类呢? 这里可能有人会想是否能够按照导出常规hbase的方法来导出数据.比如自己写Java代码,或者用hbase原生支持的工具类,或者说采用pig提供的hbase工具类.是否能这么干取决于你phoenix建表时候字段的数据类型.如果字段采用的不是varchar.char等字符类型以及uns

mysql 导入csv文件

mysql> load data local infile '/root/bao/pingtaizhangwumingxi/20170206_zwmx-pt-rmb_297308.csv' into table platform_billing_details fields TERMINATED BY ',' LINES TERMINATED BY '\r\n' ignore 1 lines; Linux 用 shell 脚本 批量 导入 csv 文件 到 mysql 数据库 前提: 每个csv

R: 导入 csv 文件,导出到csv文件,;绘图后导出为图片、pdf等

################################################### 问题:导入 csv 文件 如何从csv文件中导入数据,?参数怎么设置?常用参数模板是啥? 解决方案: yuan <- read.csv(file = "C:/Users/Administrator/Desktop/test1.csv",header = TRUE,sep = ",",dec = ".", stringsAsFactors

postman导入csv文件,批量运行

1.设置csv文件,第一行必须标明变量名 2.postman参数化设置 3.批量 run即可 原文地址:https://www.cnblogs.com/yyqx/p/10730473.html

关于MySQL中使用LOAD DATA INFILE导入csv文件时的日期格式问题

在使用MySQL时,常常会用到Load Data Infile来导入数据,在遇到Date类型的列时,有时会遇到格式转换的问题: 首先创建一张简单的people表,包含名字,生日,年龄三个字段: mysql> create table people( -> name varchar(10) NOT NULL, -> birthday date NOT NULL, -> age int NOT NULL); Query OK, 0 rows affected (0.18 sec) 构造

导入CSV文件

如何把csv文件导入到数据库: 1.excel并不是开放的标准,是微软自己的,你只能猜测它的格式: 标准还有可能变(目前国际上已经制定的标准文档规范) 操作excel,有开源的phpeExcel开源的类. 2.如果是简单的想导入数据库,可以先转化为csv文件 csv文件是简单用逗号隔开的文件格式 把excel导入数据库的方法: $file = 'test.csv'; $fh = fopen($file.'rb'); /* 思路1:每次读一行 每一行的内容再逗号拆成数组 */ while(!feo

php导入csv文件

<?php /** * Created by PhpStorm. * User: hanks * Date: 2017/4/30 * Time: 13:24 */ include 'header.php'; include 'Mysql.php'; try{ $arr=[]; $mysql=new Mysql([]); $filename = $_FILES['file']['tmp_name']; if(empty($filename)) { throw new Exception('请选择要

如何在Magento(麦进斗)导入CSV文件中的产品价格的变化?

上传CSV文件使用Magento导入功能时,产品价格会随着自定义计算公式自动改变. 我的解决办法,代码如下: <adminhtml> <events> <catalog_product_import_finish_before> <observers> <test_module1_catalog_product_import_after> <type>singleton</type> <class>gold_p

neo4j-import导入csv文件

1.停止服务.使用csv导入时要首先停止neo4j的服务 neo4j stop 2.准备数据.以用户节点,地点节点和用户签到关系为例 本文准备的用户节点只有用户id,没有其他属性,使用用户id来区分用户.注意这个id和数据库为节点分配的id不同,这个id本身也属于用户的一个属性. 本文准备的地点节点有地点id,经.纬度. 本文准备的签到关系数据包括用户id,地点id,以及签到时间 准备数据要注意每个对象的id必须全局唯一,本文中用户的id是1.2.3.4的形式,地点id是LOC_1.LOC_2.