在agent启动时,会启动Channel,SourceRunner,SinkRunner,比如在org.apache.flume.agent.embedded.EmbeddedAgent类的doStart方法中:
private void doStart() { boolean error = true; try { channel.start(); //调用Channel.start启动Channel sinkRunner.start(); //调用SinkRunner.start启动SinkRunner sourceRunner.start(); //调用SourceRunner.start启动SourceRunner supervisor.supervise(channel, new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START); supervisor.supervise(sinkRunner, new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START); supervisor.supervise(sourceRunner, new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START); error = false; .....
SourceRunner目前有两大类PollableSourceRunner和EventDrivenSourceRunner,分别对应PollableSource和EventDrivenSource,PollableSource相关类需要外部驱动来确定source中是否有消息可以使用,而EventDrivenSource相关类不需要外部驱动,自己实现了事件驱动机制,目前常见的Source类都属于EventDrivenSource类型。在org.apache.flume.conf.source.SourceType中定义了常见的Source类:
OTHER(null), SEQ("org.apache.flume.source.SequenceGeneratorSource"), NETCAT("org.apache.flume.source.NetcatSource"), EXEC("org.apache.flume.source.ExecSource"), AVRO("org.apache.flume.source.AvroSource"), SYSLOGTCP("org.apache.flume.source.SyslogTcpSource"), MULTIPORT_SYSLOGTCP("org.apache.flume.source.MultiportSyslogTCPSource"), SYSLOGUDP("org.apache.flume.source.SyslogUDPSource"), SPOOLDIR("org.apache.flume.source.SpoolDirectorySource"), HTTP("org.apache.flume.source.http.HTTPSource");
可以由org.apache.flume.source.DefaultSourceFactory获取其对应的实例.
SourceRunner则负责启动Source,比如在org.apache.flume.source.EventDrivenSourceRunner的start方法:
public void start() { Source source = getSource(); //通过getSource获取Source对象 ChannelProcessor cp = source.getChannelProcessor(); //获取ChannelProcessor 对象 cp.initialize(); //调用ChannelProcessor.initialize方法 source.start(); //调用Source.start方法 lifecycleState = LifecycleState. START; }
这里以execsource为例,配置source的类型为EXEC时,使用org.apache.flume.source.ExecSource.
主要原理是通过启动操作系统中的进程把每一行文本信息转换为Event,exec source不提供事务的特性(不能保证数据被成功发送到channel中),execsource关心的是标准输出,标准错误输出会被忽略(虽然可以打印到log中,但不会作为Event发送),可以设置restart的特性让source在退出时自动重启
主要方法分析:
1.configure方法用来设置相关参数
参数项:
command 执行的命令 restart 在命令退出后是否自动重启,默认是false restartThrottle 重启命令的等待时间,默认是10s logStderr 是否记录错误日志(只会放到日志中,不会做为Event),默认是false batchSize 一次向channel发送的event的数量,批量发送的功能,默认是20 charset 字符编码设置,默认是UTF-8
2.start方法,SourceRunner的start方法会调用这个start方法:
public void start() { logger.info( "Exec source starting with command:{}" , command ); executor = Executors. newSingleThreadExecutor(); // 初始化一个单线程的线程池,private ExecutorService executor; counterGroup = new CounterGroup(); //初始化性能计数器 runner = new ExecRunnable( command, getChannelProcessor(), counterGroup, restart, restartThrottle, logStderr , bufferCount , charset ); //实例化一个ExecRunnable线程 // FIXME : Use a callback-like executor / future to signal us upon failure. runnerFuture = executor.submit( runner); //private Future<?> runnerFuture; super.start(); //设置状态为start,lifecycleState = LifecycleState.START; logger.debug( "Exec source started"); }
3.核心是一个实现Runnable接口的线程类ExecRunnable,用于启动读取数据的进程(在一个do...while循环中)
public void run() { do { String exitCode = "unknown"; BufferedReader reader = null; try { String[] commandArgs = command.split("\\s+"); //由命令生成字符串数组 process = new ProcessBuilder(commandArgs).start(); //生成一个Process对象,启动命令进程 reader = new BufferedReader( new InputStreamReader(process.getInputStream(), charset)); //读取标准输出至缓冲区对象 // StderrLogger dies as soon as the input stream is invalid StderrReader stderrReader = new StderrReader(new BufferedReader( new InputStreamReader(process.getErrorStream(), charset)), logStderr); //StderrReader线程用于记录标准错误日志 stderrReader.setName("StderrReader-[" + command + "]"); stderrReader.setDaemon(true); stderrReader.start(); /* while((line = input.readLine()) != null) { if(logStderr) { logger.info("StderrLogger[{}] = ‘{}‘", ++i, line); //日志只会写到log中,不会转换为Event } } */ String line = null; List<Event> eventList = new ArrayList<Event>(); while ((line = reader.readLine()) != null) { //循环读取缓冲区中的内容 counterGroup.incrementAndGet("exec.lines.read"); eventList.add(EventBuilder.withBody(line.getBytes(charset))); //使用EventBuilder.withBody转换成Event并放入eventList中 if(eventList.size() >= bufferCount) { //达到batchSize的设置时调用ChannelProcessor.processEventBatch处理Event channelProcessor.processEventBatch(eventList); eventList.clear(); } } if(!eventList.isEmpty()) { channelProcessor.processEventBatch(eventList); } } .... } while(restart); //如果设置了restart,会在process退出后重新运行do代码块,否则退出 }