【Java】【Flume】Flume-NG启动过程源码分析(三)

本篇分析加载配置文件后各个组件是如何运行的?

  加载完配置文件订阅者Application类会收到订阅信息执行:

  @Subscribe
  public synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {
    stopAllComponents();
    startAllComponents(conf);
  }

  MaterializedConfiguration conf就是getConfiguration()方法获取的配置信息,是SimpleMaterializedConfiguration的一个实例。

  handleConfigurationEvent方法在前面章节(一)中有过大致分析,包括:stopAllComponents()和startAllComponents(conf)。Application中的materializedConfiguration就是MaterializedConfiguration
conf,stopAllComponents()方法中的materializedConfiguration是旧的配置信息,需要先停掉旧的组件,然后startAllComponents(conf)将新的配置信息赋给materializedConfiguration并依次启动各个组件。

  1、先看startAllComponents(conf)方法。代码如下:

private void startAllComponents(MaterializedConfiguration materializedConfiguration) {//启动所有组件最基本的三大组件
    logger.info("Starting new configuration:{}", materializedConfiguration);

    this.materializedConfiguration = materializedConfiguration;

    for (Entry<String, Channel> entry :
      materializedConfiguration.getChannels().entrySet()) {
      try{
        logger.info("Starting Channel " + entry.getKey());
        supervisor.supervise(entry.getValue(),
            new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
      } catch (Exception e){
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    /*
     * Wait for all channels to start.等待所有channel启动完毕
     */
    for(Channel ch: materializedConfiguration.getChannels().values()){
      while(ch.getLifecycleState() != LifecycleState.START
          && !supervisor.isComponentInErrorState(ch)){
        try {
          logger.info("Waiting for channel: " + ch.getName() +
              " to start. Sleeping for 500 ms");
          Thread.sleep(500);
        } catch (InterruptedException e) {
          logger.error("Interrupted while waiting for channel to start.", e);
          Throwables.propagate(e);
        }
      }
    }

    for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners()
        .entrySet()) {        //启动所有sink
      try{
        logger.info("Starting Sink " + entry.getKey());
        supervisor.supervise(entry.getValue(),
          new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
      } catch (Exception e) {
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    for (Entry<String, SourceRunner> entry : materializedConfiguration
        .getSourceRunners().entrySet()) {//启动所有source
      try{
        logger.info("Starting Source " + entry.getKey());
        supervisor.supervise(entry.getValue(),
          new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
      } catch (Exception e) {
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    this.loadMonitoring();
  }

  三大组件都是通过supervisor.supervise(entry.getValue(),new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START)启动的,其中,channel启动之后还要待所有的channel完全启动完毕之后才可再去启动sink和source。如果channel没有启动完毕就去启动另外俩组件,会出现错误,以为一旦sink或者source建立完毕就会立即与channel通信获取数据。稍后会分别分析sink和source的启动。

  supervisor是LifecycleSupervisor的一个对象,该类的构造方法会构造一个有10个线程,上限是20的线程池供各大组件使用。构造方法如下:

public LifecycleSupervisor() {
    lifecycleState = LifecycleState.IDLE;
    supervisedProcesses = new HashMap<LifecycleAware, Supervisoree>();//存储所有历史上的组件及其监控信息
    monitorFutures = new HashMap<LifecycleAware, ScheduledFuture<?>>();
    monitorService = new ScheduledThreadPoolExecutor(10,
        new ThreadFactoryBuilder().setNameFormat(
            "lifecycleSupervisor-" + Thread.currentThread().getId() + "-%d")
            .build());
    monitorService.setMaximumPoolSize(20);
    monitorService.setKeepAliveTime(30, TimeUnit.SECONDS);
    purger = new Purger();
    needToPurge = false;
  }

  supervise(LifecycleAware lifecycleAware,SupervisorPolicy policy, LifecycleState desiredState)方法则是具体执行启动各个组件的方法。flume的所有组件均实现自

LifecycleAware 接口,如图:,这个接口就三个方法getLifecycleState(返回组件运行状态)、start(组件启动)、stop(停止组件)。supervise方法代码如下:

public synchronized void supervise(LifecycleAware lifecycleAware,
      SupervisorPolicy policy, LifecycleState desiredState) {  //检查线程池状态
    if(this.monitorService.isShutdown()
        || this.monitorService.isTerminated()
        || this.monitorService.isTerminating()){
      throw new FlumeException("Supervise called on " + lifecycleAware + " " +
          "after shutdown has been initiated. " + lifecycleAware + " will not" +
          " be started");
    }
  //如果该组件已经在监控,则拒绝二次监控
    Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),
        "Refusing to supervise " + lifecycleAware + " more than once");

    if (logger.isDebugEnabled()) {
      logger.debug("Supervising service:{} policy:{} desiredState:{}",
          new Object[] { lifecycleAware, policy, desiredState });
    }
  //新的组件
    Supervisoree process = new Supervisoree();
    process.status = new Status();

    process.policy = policy;
    process.status.desiredState = desiredState;
    process.status.error = false;

    MonitorRunnable monitorRunnable = new MonitorRunnable();
    monitorRunnable.lifecycleAware = lifecycleAware;//组件
    monitorRunnable.supervisoree = process;
    monitorRunnable.monitorService = monitorService;

    supervisedProcesses.put(lifecycleAware, process);
    //创建并执行一个在给定初始延迟后首次启用的定期操作,随后,在每一次执行终止和下一次执行开始之间都存在给定的延迟。如果任务的任一执行遇到异常,就会取消后续执行。
    ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
        monitorRunnable, 0, 3, TimeUnit.SECONDS);  //启动MonitorRunnable,结束之后3秒再重新启动,可以用于重试
    monitorFutures.put(lifecycleAware, future);
  }

  该方法首先monitorService是否是正常运行状态;然后构造Supervisoree process = new Supervisoree(),进行赋值并构造一个监控进程MonitorRunnable,放入线程池去执行。

  MonitorRunnable.run()方法:

public void run() {
      logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
          supervisoree);

      long now = System.currentTimeMillis();//获取现在的时间戳

      try {
        if (supervisoree.status.firstSeen == null) {
          logger.debug("first time seeing {}", lifecycleAware);
      //如果这个组件是是初次受监控
          supervisoree.status.firstSeen = now;
        }
     //如果这个组件已经监控过
        supervisoree.status.lastSeen = now;
        synchronized (lifecycleAware) {//锁住组件
          if (supervisoree.status.discard) {//该组件已经停止监控
            // Unsupervise has already been called on this.
            logger.info("Component has already been stopped {}", lifecycleAware);
            return;//直接返回
          } else if (supervisoree.status.error) {//该组件是错误状态
            logger.info("Component {} is in error state, and Flume will not"
                + "attempt to change its state", lifecycleAware);
            return;//直接返回
          }

          supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();//获取组件最新状态,没运行start()方法之前是LifecycleState.IDLE状态

          if (!lifecycleAware.getLifecycleState().equals(
              supervisoree.status.desiredState)) {//该组件最新状态和期望的状态不一致

            logger.debug("Want to transition {} from {} to {} (failures:{})",
                new Object[] { lifecycleAware, supervisoree.status.lastSeenState,
                    supervisoree.status.desiredState,
                    supervisoree.status.failures });

            switch (supervisoree.status.desiredState) {//根据状态执行相应的操作
              case START:
                try {
                  lifecycleAware.start();   //启动组件,同时其状态也会变为LifecycleState.START
                } catch (Throwable e) {
                  logger.error("Unable to start " + lifecycleAware
                      + " - Exception follows.", e);
                  if (e instanceof Error) {
                    // This component can never recover, shut it down.
                    supervisoree.status.desiredState = LifecycleState.STOP;
                    try {
                      lifecycleAware.stop();
                      logger.warn("Component {} stopped, since it could not be"
                          + "successfully started due to missing dependencies",
                          lifecycleAware);
                    } catch (Throwable e1) {
                      logger.error("Unsuccessful attempt to "
                          + "shutdown component: {} due to missing dependencies."
                          + " Please shutdown the agent"
                          + "or disable this component, or the agent will be"
                          + "in an undefined state.", e1);
                      supervisoree.status.error = true;
                      if (e1 instanceof Error) {
                        throw (Error) e1;
                      }
                      // Set the state to stop, so that the conf poller can
                      // proceed.
                    }
                  }
                  supervisoree.status.failures++;//启动错误失败次数+1
                }
                break;
              case STOP:
                try {
                  lifecycleAware.stop();    //停止组件
                } catch (Throwable e) {
                  logger.error("Unable to stop " + lifecycleAware
                      + " - Exception follows.", e);
                  if (e instanceof Error) {
                    throw (Error) e;
                  }
                  supervisoree.status.failures++;  //组件停止错误,错误次数+1
                }
                break;
              default:
                logger.warn("I refuse to acknowledge {} as a desired state",
                    supervisoree.status.desiredState);
            }
       //两种SupervisorPolicy(AlwaysRestartPolicy和OnceOnlyPolicy)后者还未使用过,前者表示可以重新启动的组件,后者表示只能运行一次的组件
            if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
              logger.error(
                  "Policy {} of {} has been violated - supervisor should exit!",
                  supervisoree.policy, lifecycleAware);
            }
          }
        }
      } catch(Throwable t) {
        logger.error("Unexpected error", t);
      }
      logger.debug("Status check complete");
    }

  上面的 lifecycleAware.stop()和lifecycleAware.start()就是执行的sink、source、channel等的对应方法。

  这里的start需要注意如果是channel则是直接执行start方法;如果是sink或者PollableSource的实现类,则会在start()方法中启动一个线程来循环的调用process()方法来从channel拿数据(sink)或者向channel送数据(source);如果是EventDrivenSource的实现类,则没有process()方法,通过执行start()来执行想channel中送数据的操作(可以在此添加线程来实现相应的逻辑)。

  2、stopAllComponents()方法。顾名思义,就是停止所有组件的方法。该方法代码如下:

private void stopAllComponents() {
    if (this.materializedConfiguration != null) {
      logger.info("Shutting down configuration: {}", this.materializedConfiguration);
      for (Entry<String, SourceRunner> entry : this.materializedConfiguration
          .getSourceRunners().entrySet()) {
        try{
          logger.info("Stopping Source " + entry.getKey());
          supervisor.unsupervise(entry.getValue());
        } catch (Exception e){
          logger.error("Error while stopping {}", entry.getValue(), e);
        }
      }

      for (Entry<String, SinkRunner> entry :
        this.materializedConfiguration.getSinkRunners().entrySet()) {
        try{
          logger.info("Stopping Sink " + entry.getKey());
          supervisor.unsupervise(entry.getValue());
        } catch (Exception e){
          logger.error("Error while stopping {}", entry.getValue(), e);
        }
      }

      for (Entry<String, Channel> entry :
        this.materializedConfiguration.getChannels().entrySet()) {
        try{
          logger.info("Stopping Channel " + entry.getKey());
          supervisor.unsupervise(entry.getValue());
        } catch (Exception e){
          logger.error("Error while stopping {}", entry.getValue(), e);
        }
      }
    }
    if(monitorServer != null) {
      monitorServer.stop();
    }
  }

  首先,需要注意的是,stopAllComponents()放在startAllComponents(MaterializedConfiguration materializedConfiguration)方法之前的原因,由于配置文件的动态加载这一特性的存在,使得每次加载之前都要先把旧的组件停掉,然后才能去加载最新配置文件中的配置;

  其次,首次执行stopAllComponents()时,由于配置文件尚未赋值,所以并不会执行停止所有组件的操作以及停止monitorServer。再次加载时会依照顺序依次停止对source、sink以及channel的监控,通过supervisor.unsupervise(entry.getValue())停止对其的监控,然后停止monitorServer。supervisor.unsupervise方法如下:

public synchronized void unsupervise(LifecycleAware lifecycleAware) {

    Preconditions.checkState(supervisedProcesses.containsKey(lifecycleAware),
        "Unaware of " + lifecycleAware + " - can not unsupervise");

    logger.debug("Unsupervising service:{}", lifecycleAware);

    synchronized (lifecycleAware) {
    Supervisoree supervisoree = supervisedProcesses.get(lifecycleAware);
    supervisoree.status.discard = true;
      this.setDesiredState(lifecycleAware, LifecycleState.STOP);
      logger.info("Stopping component: {}", lifecycleAware);
      lifecycleAware.stop();
    }
    supervisedProcesses.remove(lifecycleAware);
    //We need to do this because a reconfiguration simply unsupervises old
    //components and supervises new ones.
    monitorFutures.get(lifecycleAware).cancel(false);
    //purges are expensive, so it is done only once every 2 hours.
    needToPurge = true;
    monitorFutures.remove(lifecycleAware);
  }

  该方法首先会检查正在运行的组件当中是否有此组件supervisedProcesses.containsKey(lifecycleAware);如果存在,则对此组件标记为已取消监控supervisoree.status.discard = true;将状态设置为STOP,并停止组件lifecycleAware.stop();然后从删除此组件的监控记录,包括从记录正在处于监控的组件的结构supervisedProcesses以及记录组件及其对应的运行线程的结构monitorFutures中删除相应的组件信息,并且needToPurge
= true会使得两小时执行一次的线程池清理操作。

  有一个问题就是,sink和source是如何找到对应的channel的呢??其实前面章节就已经讲解过,分别在AbstractConfigurationProvider.loadSources方法中通过ChannelSelector配置source对应的channel,而在source中通过getChannelProcessor()获取channels,通过channelProcessor.processEventBatch(eventList)将events发送到channel中;而在AbstractConfigurationProvider.loadSinks方法中sink.setChannel(channelComponent.channel)来设置此sink对应的channel,然后在sink的实现类中通过getChannel()获取设置的channel,并使用channel.take()从channel中获取event进行处理。

  

  以上三节是Flume-NG的启动、配置文件的加载、配置文件的动态加载、组件的执行的整个流程。文中的疏漏之处,请各位指教,我依然会后续继续完善这些内容的。

【Java】【Flume】Flume-NG启动过程源码分析(三),布布扣,bubuko.com

时间: 2024-10-29 19:06:11

【Java】【Flume】Flume-NG启动过程源码分析(三)的相关文章

【Java】【Flume】Flume-NG启动过程源码分析(一)

从bin/flume 这个shell脚本可以看到Flume的起始于org.apache.flume.node.Application类,这是flume的main函数所在. main方法首先会先解析shell命令,如果指定的配置文件不存在就甩出异常. 根据命令中含有"no-reload-conf"参数,决定采用那种加载配置文件方式:一.没有此参数,会动态加载配置文件,默认每30秒加载一次配置文件,因此可以动态修改配置文件:二.有此参数,则只在启动时加载一次配置文件.实现动态加载功能采用了

【Java】【Flume】Flume-NG启动过程源码分析(二)

本节分析配置文件的解析,即PollingPropertiesFileConfigurationProvider.FileWatcherRunnable.run中的eventBus.post(getConfiguration()).分析getConfiguration()方法.此方法在AbstractConfigurationProvider类中实现了,并且这个类也初始化了三大组件的工厂类:this.sourceFactory = new DefaultSourceFactory();this.s

Flume-NG启动过程源码分析(三)(原创)

上一篇文章分析了Flume如何加载配置文件的,动态加载也只是重复运行getConfiguration(). 本篇分析加载配置文件后各个组件是如何运行的? 加载完配置文件订阅者Application类会收到订阅信息执行: @Subscribe public synchronized void handleConfigurationEvent(MaterializedConfiguration conf) { stopAllComponents(); startAllComponents(conf)

Activity启动流程源码分析之Launcher启动(二)

1.前述 在前一篇文章中我们简要的介绍Activity的启动流程Activity启动流程源码分析之入门(一),当时只是简单的分析了一下流程,而且在上一篇博客中我们也说了Activity的两种启动方式,现在我们就来分析其中的第一种方式--Launcher启动,这种启动方式的特点是会创建一个新的进程来加载相应的Activity(基于Android5.1源码). 2.Activity启动流程时序图 好啦,接下来我们先看一下Launcher启动Activity的时序图: 好啦,接下来我们将上述时序图用代

Android系统默认Home应用程序(Launcher)的启动过程源码分析

在前面一篇文章中,我们分析了Android系统在启动时安装应用程序的过程,这些应用程序安装好之后,还须要有一个Home应用程序来负责把它们在桌面上展示出来,在Android系统中,这个默认的Home应用程序就是Launcher了,本文将详细分析Launcher应用程序的启动过程. Android系统的Home应用程序Launcher是由ActivityManagerService启动的,而ActivityManagerService和PackageManagerService一样,都是在开机时由

嵌入式linux开发uboot移植(三)——uboot启动过程源码分析

嵌入式linux开发uboot移植(三)--uboot启动过程源码分析 一.uboot启动流程简介 与大多数BootLoader一样,uboot的启动过程分为BL1和BL2两个阶段.BL1阶段通常是开发板的配置等设备初始化代码,需要依赖依赖于SoC体系结构,通常用汇编语言来实现:BL2阶段主要是对外部设备如网卡.Flash等的初始化以及uboot命令集等的自身实现,通常用C语言来实现. 1.BL1阶段 uboot的BL1阶段代码通常放在start.s文件中,用汇编语言实现,其主要代码功能如下:

【Android】应用程序启动过程源码分析

在Android系统中,应用程序是由Activity组成的,因此,应用程序的启动过程实际上就是应用程序中的默认Activity的启动过程,本文将详细分析应用程序框架层的源代码,了解Android应用程序的启动过程. 启动Android应用程序中的Activity的两种情景:其中,在手机屏幕中点击应用程序图标的情景就会引发Android应用程序中的默认Activity的启动,从而把应用程序启动起来.这种启动方式的特点是会启动一个新的进程来加载相应的Activity. 这里,我们以这个例子为例来说明

Openstack liberty 中Cinder-api启动过程源码分析2

在前一篇博文中,我分析了cinder-api启动过程中WSGI应用的加载及路由的建立过程,本文将分析WSGI服务器的启动及其接收客户端请求并完成方法映射的过程,请看下文的分析内容: 创建WSGI服务器 在初始化WSGIService对象的过程中,会创建WSGI服务器,如下: #`cinder.service:WSGIService.__init__` def __init__(self, name, loader=None): """ name = `osapi_volume

spark内核揭秘-13-Worker中Executor启动过程源码分析

进入Worker类源码: 可以看出Worker本身是Akka中的一个Actor. 进入Worker类的LaunchExecutor: 从源代码可以看出Worker节点上要分配CPU和Memory给新的Executor,首先需要创建一个ExecutorRunner: ExecutorRunner是用于维护executor进程的: 1.进入ExecutorRunner 的start方法: 1.1.进入fetchAndRunExecutor()方法(核心方法): 2.进入 master ! Execu