flume-ng 是一个分布式,高可用的日志收集系统。主要用来将分布在不同服务器上的业务日志汇总在一个集中的数据存储中心
一 安装与环境配置
下载地址 http://flume.apache.org/download.html , 下载Apache
Flume binary至目标服务器解压
运行环境java版本:Java 1.6 or later (Java 1.7 Recommended)
配置JAVA_HOME变量
将解压文件路径/bin配置加入环境变量
二 命令参数
Usage: /home/dongxiao.yang/apache-flume-1.4.0-bin/bin/flume-ng <command>
[options]...
commands:
help display this help text
agent run a Flume
agent
avro-client run an avro Flume client
version show Flume version
info
global options:
--conf,-c <conf> use configs in <conf>
directory
--classpath,-C <cp> append to the classpath
--dryrun,-d do
not actually start Flume, just print the command
--plugins-path <dirs>
colon-separated list of plugins.d directories. See the
plugins.d section in
the user guide for more details.
Default:
$FLUME_HOME/plugins.d
-Dproperty=value sets a Java system property
value
-Xproperty=value sets a Java -X option
agent options:
--conf-file,-f <file> specify a config file
(required)
--name,-n <name> the name of this agent
(required)
--help,-h display help text
avro-client options:
--rpcProps,-P <file> RPC client properties file
with server connection params
--host,-H <host> hostname to which events
will be sent
--port,-p <port> port of the avro source
--dirname
<dir> directory to stream to avro source
--filename,-F <file>
text file to stream to avro source (default: std input)
--headerFile,-R
<file> File containing event headers as key/value pairs on each new
line
--help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included
first
in the classpath.
配置文件简单例子
#define
agent1.sources = source1
agent1.channels =
channel1
agent1.sinks = sink1 sink2
#Describe the source
agent1.sources.source1.type =
exec
agent1.sources.source1.command = tail -F
/srv/apps/taskworker/log/taskworker.log
agent1.sources.source1.interceptors=e1
agent1.sources.source1.interceptors.e1.type=timestamp
#Describe the sink
agent1.sinks.sink1.type =
avro
agent1.sinks.sink1.hostname= 10.4.1.100
agent1.sinks.sink1.port =
10000
#Describe the channnel
agent1.channels.channel1.type =
file
agent1.channels.channel1.checkpointDir = /home/dongxiao.yang/checkpoint
agent1.channels.channel1.dataDirs
= /home/dongxiao.yang/data
#Bind the source and sink to the
channel
agent1.sources.source1.channels =
channel1
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink2.channel
= channel1
启动命令格式:
flume-ng agent --conf /home/dongxiao.yang/apache-flume-1.4.0-bin/conf/ --conf-file
/home/dongxiao.yang/apache-flume-1.4.0-bin/conf/
--name agent1 -Dflume.root.logger=INFO,console -Duser.timezone=UTC
参考资料:http://flume.apache.org/FlumeUserGuide.html 官方文档
Apache
Flume Distributed Log Collection for
Hadoop.pdf 基于1.3版本,主要介绍了收集常见日志文件写入hdfs的几个结构