Channels are the repositories where the events are staged on a agent. Source adds the events and Sink removes it.
通道就是事件暂存的地方,source负责往通道中添加event,sink负责从通道中移出event
flume1.5.2内置的通道有:内存,文件,jdbc
1、内存通道memory-channel
时间存储在内存队列中,对于性能要求高且能接受agent失败时数据丢失的情况是很好的选择
capacity:默认该通道中最大的可以存储的event数量是100,
trasactionCapacity:每次最大可以source中拿到或者送到sink中的event数量也是100
keep-alive:event添加到通道中或者移出的允许时间
byte**:即event的字节量的限制,只包括eventbody
a1.channels = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 10000 a1.channels.c1.byteCapacityBufferPercentage = 20 a1.channels.c1.byteCapacity = 800000
该通道的最大缺陷就是数据会丢失
2、JDBC通道JDBC-channel
事件是持久化存储在数据库中,该通道目前支持flume1.5.2内置的derby数据库。
这是一个耐用的通道,对于可恢复性要求高的情况是很理想的。
当然derby这种数据库对你来说肯定不适合,互联网公司现在都是mysql,所以想要将jdbc-channel和mysql进行结合的话,就需要使用者结合自身情况进行定制化的开发了。
3、文件通道File-channel
By default the File Channel uses paths for checkpoint and data directories that are within the user home as specified
above. As a result if you have more than one File Channel instances active within the agent, only one will be able to lock the directories and cause the other channel initialization to fail. It is therefore necessary that you provide explicit paths to all
the configured channels, preferably on different disks. Furthermore, as file channel will sync to disk after every commit, coupling it with a sink/source that batches events together may be necessary to provide good performance where multiple disks are not
available for checkpoint and data directories.
自然是将通道数据同步到磁盘上,性能就会下降,但是添加了检查点机制,防止数据丢失。
针对变形的内存通道,也就是内存通道和文件通道结合使用的,我们在此不进行讲解,因为这种混合使用,官方也给出提示——不建议在生产环境使用。
原因还是没有解决数据丢失的问题,或者一旦线上出现问题,排查问题又更加复杂了。