Flume practices and sqoop hive 2 oracle

#receive the file

flume-ng agent --conf conf --conf-file conf1.conf --name a1

flume-ng agent --conf conf --conf-file conf2.conf --name hdfs-agent

flume-ng agent --conf conf --conf-file conf3.conf --name file-agent

Conf1.conf

a1.sources = tail

a1.channels = c1

a1.sinks = avro-forward-sink

a1.channels.c1.type = file

#a1.channels.c1.capacity = 1000

#a1.channels.c1.transactionCapacity = 100

a1.sources.tail.type = spooldir

a1.sources.tail.spoolDir = /path/to/folder/

a1.sinks.avro-forward-sink.type = avro

a1.sinks.avro-forward-sink.hostname =hostname/ip

a1.sinks.avro-forward-sink.port = 12345

# Bind the source and sink to the channel

a1.sources.tail.channels = c1

a1.sinks.avro-forward-sink.channel = c1

Conf2.conf

hdfs-agent.sources= avro-collect

hdfs-agent.sinks = hdfs-write

hdfs-agent.channels=ch1

hdfs-agent.channels.ch1.type = file

#hdfs-agent.channels.ch1.capacity = 1000

#hdfs-agent.channels.ch1.transactionCapacity = 100

hdfs-agent.sources.avro-collect.type = avro

hdfs-agent.sources.avro-collect.bind = 10.59.123.69

hdfs-agent.sources.avro-collect.port = 12345

hdfs-agent.sinks.hdfs-write.type = hdfs

hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://namenode/user/usera/test/

hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text

# Bind the source and sink to the channel

hdfs-agent.sources.avro-collect.channels = ch1

hdfs-agent.sinks.hdfs-write.channel = ch1

Start the conf2.conf first, then start conf1.conf agent.

Because the avro source should start first then avro sink can connect to it.

#when use memory change, issue is :

org.apache.flume.ChannelException: Unable to put batch on required channel:

org.apache.flume.channel.MemoryChannel{name: ch1}

#change to filechannel

ok...

#batched change the filename, remove .completed

for f in *;

mv $f ${f%.COMPLETED*};

done;

Sqoop load data from hive to oracle:

sqoop export -D oraoop.disabled=true \

--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=hostname)(port=port))(connect_data=(service_name=sname)))" \

--username user_USER \

--password pwd \

--table EVAN_TEST \

--fields-terminated-by ‘\001‘ \

-m 1 \

--export-dir /path/to/folder/

####table name should in upper case. Or else, report exception not found columns information.

时间： 2024-11-11 10:34:56

Flume practices and sqoop hive 2 oracle的相关文章

sqoop操作之ORACLE导入到HIVE

导入表的所有字段 sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \ --username SCOTT --password tiger \ --table EMP \ --hive-import --create-hive-table --hive-table emp -m 1; 如果报类似的错: ERROR tool.ImportTool: Encountered IOException running imp

通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据

通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据\ 下文将重点说明通过Sqoop实现Mysql与HDFS互导数据,Mysql与Hbase,Oracle与Hbase的互导最后给出命令.一.Mysql与HDFS互导数据环境:宿主机器操作系统为Win7,Mysql安装在宿主机上,宿主机地址为192.168.66.963台虚拟机操作系统为Ubuntu-12.04.1-32位三台虚拟机已成功安装hadoop,并实现免密钥互访,配hosts为:192.168.66.91 m

sqoop操作之Oracle导入到HDFS

导入表的所有字段 sqoop import --connect jdbc:oracle:thin:@192.168.1.100:1521:ORCL \ --username SCOTT --password tiger \ --table EMP -m 1; 查看执行结果: hadoop fs -cat /user/hadoop/EMP/part-m-00000 7369,SMITH,CLERK,7902,1980-12-17 00:00:00.0,800,null,20 7499,ALLEN,

sqoop clob从Oracle导入到hive   回车换行导致记录增多

sqoop import --hive-import --hive-overwrite --connect jdbc:oracle:thin:@192.168.92.136:1521:cyporcl --username ODS --password 'od154DS$!(' -m 1 --hive-database ODS --table Q_TRA_DISPUTESTATUS --fields-terminated-by '\001' --hive-drop-import-delims

spark+hadoop+sqoop+hive平台bug解决方法

bug集锦 1. hadoop平台datanode无法启动: 原因: 由于多次hdfs namenode -format导致dfs/data/current/version中的cluserID与当前的cluserID不统一,通过查看hadoop/logs中的datanode日志文件查看到此错误. 解决方法: 1). 修改每台节点上的/dfs/current/version文件中的cluserID的值为当前的值(当前值可以通过hadoop/logs日志文件中的报错获得). 2). 每次format

Hive实现oracle的Minus函数

在Oracle中minus运算的主要功能是: 在进行两个表格或者两个查询结果的时候,返回在第一个表格/查询结果中与第二个表格/查询结果不同样的记录. 结果不同样的记录包括两种情况:A,B 表中某一行的内容不同和A表中的数据在B表中不存在.总之返回的是A表的数据. Hive中没有实现minus功能的函数,仅仅好分步实现. 一.找出在A表中的数据在B表中不存在的数据 insert overwrite table tmp_A partition(name='A_innot_B') select a.*

Sqoop hive导出到mysql[转]

通过Sqoop将Hive表数据导入到MySQL通常有两种情况. 第一种是将hive上某张表的全部数据导入到mysql对应的表中. 第二种是将hive上某张表中的部分数据导入到mysql对应的表中. 两种方式的区别在于第二种情况需要指定要导入数据的列名称.两种情况的导入方式分别如下: 1.全部导入 Sqoop export --connect jdbc:mysql://127.0.0.1:3306/dbname --username mysql(mysql用户名) --password 123

flume的sink写入hive表

a1.sources = r1 a1.sinks = s1 a1.channels = c1 a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 a1.sinks.s1.type = hive a1.sinks.s1.type.hive.metastore=thrift://master:9083 a1.sinks.s1.type.hive.datebase=bd1

hive 修改oracle作为元数据库报错

17/05/02 15:45:28 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* insteadorg.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083. a