Flume架构及使用例子

Flume架构及核心组件

（1）Source 收集 负责从什么地方采集数据
（2）Channel 记录
（3）Sink 输出

官方文档

http://flume.apache.org/FlumeUserGuide.html

http://flume.apache.org/FlumeUserGuide.html#starting-an-agent

Flume使用思路

使用flume的关键就是写配置文件

（1）配置Source

（2）配置Channerl

（3）配置Sink

（4）把以上三个组件串起来

样例

样例1：从指定网络端口采集数据输出到控制台

代码实现：

# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动agent

http://flume.apache.org/FlumeUserGuide.html#starting-an-agent

$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template -Dflume.root.logger=INFO,console

-n 和-name同样含义，为agent名称

-c 和-conf同样含义，为指定一个配置文件

-Dflume.root.logger=INFO,console 在控制台输出执行信息

使用telnet进行测试

telnet localhost 44444

输出结果分析

Event：{headers:{} body: 68 65 6c 6c 6f 0d hello}

Event是Flume数据传输的基本单元

Event = 可选的header + byte array

样例2:监控一个文件实时采集新增的数据输出到控制台

Agent选型

exec source + memory channel + logger sink

Exec Source文档地址

http://flume.apache.org/FlumeUserGuide.html#exec-source

代码实现

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/test.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

样例3:将A端服务器日志实时采集到B端服务器

技术选型

exec source + memory channel + avro sink
avro source + memory channel + logger sink

代码实现

A端服务器

exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = hadoop000
exec-memory-avro.sinks.avro-sink.port = 44444

exec-memory-avro.channels.memory-channel.type = memory

exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

B端服务器

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = hadoop000
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

原文地址：http://blog.51cto.com/wangyichao/2151587

时间： 2024-11-04 00:31:29

Flume架构及使用例子的相关文章

Flume架构以及应用介绍（转）

在具体介绍本文内容之前,先给大家看一下Hadoop业务的整体开发流程: 从Hadoop的业务开发流程图中可以看出,在大数据的业务处理过程中,对于数据的采集是十分重要的一步,也是不可避免的一步,从而引出我们本文的主角-Flume.本文将围绕Flume的架构.Flume的应用(日志采集)进行详细的介绍. (一)Flume架构介绍 1.Flume的概念 flume是分布式的日志收集系统,它将各个服务器中的数据收集起来并送到指定的地方去,比如说送到图中的HDFS,简单来说flume就是收集日志的. 2.

Flume架构以及应用介绍[转]

Flume日志收集系统架构详解--转

2017-09-06 朱洁大数据和云计算技术任何一个生产系统在运行过程中都会产生大量的日志,日志往往隐藏了很多有价值的信息.在没有分析方法之前,这些日志存储一段时间后就会被清理.随着技术的发展和分析能力的提高,日志的价值被重新重视起来.在分析这些日志之前,需要将分散在各个生产系统中的日志收集起来.本节介绍广泛应用的Flume日志收集系统. 一.概述 Flume是Cloudera公司的一款高性能.高可用的分布式日志收集系统,现在已经是Apache的顶级项目.同Flume相似的日志收集系统还有F

Flume之核心架构深入解析

我们一起来了解Source.Channel和Sink的全链路过程. 一.Flume架构分析这个图中核心的组件是: Source,ChannelProcessor,Channel,Sink.他们的关系结构如下: Source { ChannelProcessor { Channel ch1 Channel ch2 - } } Sink { Channel ch; } SinkGroup { Channel ch: Sink s1: Sink s2: - } 二.各组件详细介绍 1.Source组

Flume入门笔记------架构以及应用介绍

FLUME日志收集

一.FLUME介绍 Flume是一个分布式.可靠.和高可用的海量日志聚合的系统,支持在系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力. 设计目标: (1) 可靠性当节点出现故障时,日志能够被传送到其他节点上而不会丢失.Flume提供了三种级别的可靠性保障,从强到弱依次分别为:end-to-end(收到数据agent首先将event写到磁盘上,当数据传送成功后,再删除:如果数据发送失败,可以重新发送.),Store on fa

Flume 源码阅读

Flume架构主要由3个组件,分别是Source,Channel和Sink,3个组件组成Event在Flume中得数据流向或者说流水线,功能可以由Flume的介绍看出:When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it’s consumed by a Flume sin

Flume介绍

Flume是一个分布式的,效率高的用来收集日志数据的开源框架.它的架构是基于流式数据,有3个重要的组件,分别是Source,Channel和Sink. Flume架构和特点 Flume架构图如上,非常简单. 一个Flume的事件(event)表示数据流中的一个单位,它会带有字节数据和可选的字符串属性.一个Flume的agent是一个JVM进程,agent持有3个组件,这3个组件分别是Source,Channel和Sink. Source组件会消费来自外部的一些事件源数据,这个外部事件源比如是一个

【转】Flume日志收集

from:http://www.cnblogs.com/oubo/archive/2012/05/25/2517751.html Flume日志收集一.Flume介绍 Flume是一个分布式.可靠.和高可用的海量日志聚合的系统,支持在系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力. 设计目标: (1) 可靠性当节点出现故障时,日志能够被传送到其他节点上而不会丢失.Flume提供了三种级别的可靠性保障,从强到弱依次分别为:e