Flume NG 学习笔记（十） Transaction、Sink、Source和Channel开发 / 憋错料

一、Transaction interface

Transaction接口是基于flume的稳定性考虑的。所有主要的组件（sources、sinks、channels)都必须使用Flume Transaction。我们也可以理解Transaction接口就是flume的事务，sources和sinks的发送数据与接受数据都是在一个Transaction里完成的。

从上图中可以看出，一个Transaction在Channel实现内实现。每一个连接到channel的source和sink都要获取一个Transaction对象。这Sources实际上使用了一个ChannelSelector接口来封装Transaction。存放事件到channel和从channel中提取事件的操作是在一个活跃的Transaction内执行的。

下面是官网例子

[java] view plain copy

Channel ch = new MemoryChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do
Event eventToStage = EventBuilder.withBody("Hello Flume!",
Charset.forName("UTF-8"));
ch.put(eventToStage);
// Event takenEvent = ch.take();
// ...
txn.commit();
} catch (Throwable t) {
txn.rollback();
// Log exception, handle individual exceptions as needed
// re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}

上面的代码是一个很简单的Transaction示例，在自定义Source与自定义Sink中都要使用。

二、自定义Sink开发

Sink提取event数据从channel中，然后直接将数据发送到下一个flume agent中或者存储到外部库中。

Sink和channel的关联关系可以在配置文件中配置。有一个SinkRunner实例与每一个已配置的Sink关联，当Flume框架调用SinkRunner.start()方法时候，将创建一个新的线程来驱动这Sink。

这个线程将管理这个Sink的生命周期。Sink需要实现LifecycleAware接口的start()和stop()方法。start()方法用于初始化数据；stop()用于释放资源；process()是从channel中提取event数据和转发数据的核心方法。

这Sink需要实现Configurable接口以便操作配置文件。

下面是官网例子：

[java] view plain copy

public class MySink extends AbstractSink implements Configurable {
private String myProp;
@Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue");
// Process the myProp value (e.g. validation)
// Store myProp for later retrieval by process() method
this.myProp = myProp;
}
@Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
}
@Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
}
@Override
public Status process() throws EventDeliveryException {
Status status = null;
// Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do
Event event = ch.take();
// Send the Event to the external repository.
// storeSomeData(e);
txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback();
// Log exception, handle individual exceptions as needed
status = Status.BACKOFF;
// re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

下面是测试例子：

[java] view plain copy

import org.apache.flume.Channel;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.Transaction;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
public class Custom_Sink extends AbstractSink implements Configurable {
private String myProp;
@Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue");
// Process the myProp value (e.g. validation)
// Store myProp for later retrieval by process() method
this.myProp = myProp;
}
@Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
}
@Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
}
@Override
public Status process() throws EventDeliveryException {
Status status = null;
// Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do
Event event = ch.take();
String out = new String(event.getBody());
// Send the Event to the external repository.
// storeSomeData(e);
System.out.println(out);
txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback();
// Log exception, handle individual exceptions as needed
status = Status.BACKOFF;
// re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

上面的测试例子只输出事件的BODY信息，这里说明下直接用代码event.getBody().tostring() 输出是乱码。因为所有sink都是在Transaction里完成的，因此自定义开发sink是需要加上Transaction相关设置。

然后是测试配置，这里是自定义的jar 包是flumedev.Custom_Sink。注意，打包之后请放在目录$FLUME_HOME/lib下

[html] view plain copy

#配置文件：custom_sink_case23.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 50000
a1.sources.r1.bind = 192.168.233.128
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.channel = c1
a1.sinks.k1.type = flumedev.Custom_Sink
#a1.sinks.k1.type =logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#敲命令

flume-ng agent -cconf -f conf/custom_sink_case23.conf -n a1 -Dflume.root.logger=INFO,console

启动成功后

打开另一个终端输入，往侦听端口送数据

echo "testcustom_sink" | nc 192.168.233.128 50000

#在启动的终端查看console输出

可以看到数据正常输出。

三、自定义Source开发

Source从外面接收数据并把数据存入Channel中。很少有人用。

下面是官网的例子

[java] view plain copy

public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp;
@Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue");
// Process the myProp value (e.g. validation, convert to another type, ...)
// Store myProp for later retrieval by process() method
this.myProp = myProp;
}
@Override
public void start() {
// Initialize the connection to the external client
}
@Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
}
@Override
public Status process() throws EventDeliveryException {
Status status = null;
// Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do
// Receive new data
Event e = getSomeData();
// Store the Event into this Source‘s associated Channel(s)
getChannelProcessor().processEvent(e)
txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback();
// Log exception, handle individual exceptions as needed
status = Status.BACKOFF;
// re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

测试的话，主要针对Event e 这里进行传输数据，这里就不测试了。

四、自定义Channel开发

官网说待定。

下面是美团网的自定义Channel 开发，下面是链接

http://tech.meituan.com/mt-log-system-optimization.html

……

Flume本身提供了MemoryChannel和FileChannel。MemoryChannel处理速度快，但缓存大小有限，且没有持久化；FileChannel则刚好相反。我们希望利用两者的优势，在Sink处理速度够快，Channel没有缓存过多日志的时候，就使用MemoryChannel，当Sink处理速度跟不上，又需要Channel能够缓存下应用端发送过来的日志时，就使用FileChannel，由此我们开发了DualChannel，能够智能的在两个Channel之间切换。

其具体的逻辑如下：

[java] view plain copy

/***
* putToMemChannel indicate put event to memChannel or fileChannel
* takeFromMemChannel indicate take event from memChannel or fileChannel
* */
private AtomicBoolean putToMemChannel = new AtomicBoolean(true);
private AtomicBoolean takeFromMemChannel = new AtomicBoolean(true);
void doPut(Event event) {
if (switchon && putToMemChannel.get()) {
//往memChannel中写数据
memTransaction.put(event);
if ( memChannel.isFull() || fileChannel.getQueueSize() > 100) {
putToMemChannel.set(false);
}
} else {
//往fileChannel中写数据
fileTransaction.put(event);
}
}
Event doTake() {
Event event = null;
if ( takeFromMemChannel.get() ) {
//从memChannel中取数据
event = memTransaction.take();
if (event == null) {
takeFromMemChannel.set(false);
}
} else {
//从fileChannel中取数据
event = fileTransaction.take();
if (event == null) {
takeFromMemChannel.set(true);
putToMemChannel.set(true);
}
}
return event;
}

这里要说明下，官网是建议使用file channel，虽然它的效率比较低，但是它能保证数据完整性，而memory channel效率高，但是只能对数据丢失和重复不太敏感的业务使用

时间： 2024-12-30 15:16:46

Flume NG 学习笔记（十） Transaction、Sink、Source和Channel开发

一、Transaction interface

二、自定义Sink开发

三、自定义Source开发

四、自定义Channel开发

Flume NG 学习笔记（十） Transaction、Sink、Source和Channel开发的相关文章

Flume NG 学习笔记（四）Source配置

Flume NG 学习笔记（五）Sinks和Channel配置

Flume NG 学习笔记（一）简介

Flume NG 学习笔记（二）单机与集群Flume 配置

Flume NG 学习笔记（八）Interceptors（拦截器）测试

Flume NG 学习笔记（三）流配置

Swift学习笔记十二：下标脚本（subscript）

第十七篇:实例分析(3)--初探WDDM驱动学习笔记(十)

初探swift语言的学习笔记十(block)