Flume_使用

案例一: source:hive.log channel: memory  sink: logger输出

拷贝一份flume-conf.properties.template改名为hive-mem-log.properties
hive-mem-log.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  # defined the sink
  a1.sinks.k1.type = logger
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n al -f conf/hive-mem-log.properties -Dflume.root.logger=INFO,console
  在hive端打印几条命令查看
注意flume的启动顺序和关闭顺序是不同的

案例二:source:hive.log channel: file  sink: logger输出

拷贝一份flume-conf.properties.template改名为hive-file-log.properties
hive-file-log.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = file
  a1.channels.c1.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp
  a1.channels.c1.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data
  # defined the sink
  a1.sinks.k1.type = logger
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-file-log.properties -Dflume.root.logger=INFO,console
  查看自定义文件夹下数据文件

案例三:source:hive.log channel: mem  sink: hdfs

拷贝一份flume-conf.properties.template改名为hive-mem-hdfs.properties
hive-mem-hdfs.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = hdfs
  a1.sinks.k1.hdfs.path = /flume/hdfs/
  a1.sinks.k1.hdfs.fileType = DataStream
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-hdfs.properties -Dflume.root.logger=INFO,console
  查看HDFS下数据文件
  可以发现,定义的目录不存在时,会自动被创建
时间: 2024-10-13 16:19:47

Flume_使用的相关文章

Flume_初识

企业架构 数据源 webserver RDBMS 数据的采集 shell.flume.sqoop job 监控和调度 hue.oozie 数据清洗及分析 mapreduce.hive 数据保存 sqoop 概念: 三大功能 collecting(收集),aggregating(聚合),moving(传输) Flume是一个分布式的,可靠的,可用的,健壮且高容错性的框架,非常有效率的对大数据量 的日志数据进行收集,聚集,传输信息的服务,但老版本仅仅运行在Linux环境中 特点: on stream

Flume_常见的几个问题

在HDFS的文件默认生成文件大小1K,如何设置文件大小和数量 拷贝一份flume-conf.properties.template改名为hive-mem-size.properties hive-mem-size.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/c

Flume_企业中日志处理

企业中的日志存放_1 201611/20161112.log.tmp 第二天文件变为20161112.log与20161113.log.tmp 拷贝一份flume-conf.properties.template改名为dir-mem-hdfs.properties 实现监控某一目录,如有新文件产生则上传至hdfs,另外过滤掉新文件中tmp文件 dir-mem-hdfs.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defi