MapReduce job在JobTracker初始化源码级分析

　　mapreduce
job提交流程源码级分析（三）中已经说明用户最终调用JobTracker.submitJob方法来向JobTracker提交作业。而这个方法的核心提交方法是JobTracker.addJob(JobID
jobId, JobInProgress
job)方法，这个addJob方法会把Job提交到调度器(默认是JobQueueTaskScheduler)的监听器JobQueueJobInProgressListener和EagerTaskInitializationListener(本文只讨论默认调度器)中，使用方法jobAdded(JobInProgress
job)，JobQueueJobInProgressListener任务是监控各个JobInProcess生命周期中的变化；EagerTaskInitializationListener是发现有新Job后对其初始化的。

　　一、JobQueueJobInProgressListener.jobAdded(JobInProgress
job)方法。就一句代码jobQueue.put(new JobSchedulingInfo(job.getStatus()),
job)，先构建一个JobSchedulingInfo对象，然后和JobInProgress对应起来放入jobQueue中。JobSchedulingInfo类维护这调度这个job必备的一些信息，比如优先级(默认是NORMAL)、JobID以及开始时间startTime。

　　二、EagerTaskInitializationListener.jobAdded(JobInProgress
job)方法。　　

 1 /**

 2    * We add the JIP to the jobInitQueue, which is processed

 3    * asynchronously to handle split-computation and build up

 4    * the right TaskTracker/Block mapping.

 5    */

 6   @Override

 7   public void jobAdded(JobInProgress job) {

 8     synchronized (jobInitQueue) {

 9       jobInitQueue.add(job);　　//添加进List<JobInProgress> jobInitQueue

10       resortInitQueue();

11       jobInitQueue.notifyAll();　　//唤醒阻塞的进程

12     }

13

14   }

　　上面方法中resortInitQueue()方法主要是对jobInitQueue中JobInProcess进行排序，先按照优先级排序，相同的再按开始时间。EagerTaskInitializationListener.start()在调度器初始化时JobQueueTaskScheduler.start()就调用了，所以先于jobAdded方法调用。EagerTaskInitializationListener.start()代码如下：

1 public void start() throws IOException {

2     this.jobInitManagerThread = new Thread(jobInitManager, "jobInitManager");

3     jobInitManagerThread.setDaemon(true);

4     this.jobInitManagerThread.start();

5   }

　　start()方法会启动一个线程：JobInitManager。

 1 /////////////////////////////////////////////////////////////////

 2   //  Used to init new jobs that have just been created

 3   /////////////////////////////////////////////////////////////////

 4   class JobInitManager implements Runnable {

 5

 6     public void run() {

 7       JobInProgress job = null;

 8       while (true) {

 9         try {

10           synchronized (jobInitQueue) {

11             while (jobInitQueue.isEmpty()) {

12               jobInitQueue.wait();

13             }

14             job = jobInitQueue.remove(0);

15           }

16           threadPool.execute(new InitJob(job));

17         } catch (InterruptedException t) {

18           LOG.info("JobInitManagerThread interrupted.");

19           break;

20         }

21       }

22       LOG.info("Shutting down thread pool");

23       threadPool.shutdownNow();

24     }

25   }

26

27   class InitJob implements Runnable {

28

29     private JobInProgress job;

30

31     public InitJob(JobInProgress job) {

32       this.job = job;

33     }

34

35     public void run() {

36       ttm.initJob(job);//对应JobTracker的对应方法

37     }

38   }

　　JobInitManager线程的run方法是一个死循环始终监控jobInitQueue是否为空，不为空的话就取出0位置的JobInProgress，在InitJob线程中初始化：TaskTrackerManager.initJob(job)对应JobTracker的initJob方法。这里为什么会另起线程来初始化Job呢？原因很简单，就是可能jobInitQueue中同时会有很多JobInProgress，一个一个的初始化会比较慢，所以采用多线程的方式初始化。来看initJob方法的代码：

 1   public void initJob(JobInProgress job) {

 2     if (null == job) {

 3       LOG.info("Init on null job is not valid");

 4       return;

 5     }

 6

 7     try {

 8       JobStatus prevStatus = (JobStatus)job.getStatus().clone();

 9       LOG.info("Initializing " + job.getJobID());

10       job.initTasks();    //调用该实例的initTasks方 法，对job进行初始化

11       // Inform the listeners if the job state has changed

12       // Note : that the job will be in PREP state.

13       JobStatus newStatus = (JobStatus)job.getStatus().clone();

14       if (prevStatus.getRunState() != newStatus.getRunState()) {

15         JobStatusChangeEvent event =

16           new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED, prevStatus,

17               newStatus);

18         synchronized (JobTracker.this) {

19           updateJobInProgressListeners(event);

20         }

21       }

22     } catch (KillInterruptedException kie) {

23       //   If job was killed during initialization, job state will be KILLED

24       LOG.error("Job initialization interrupted:\n" +

25           StringUtils.stringifyException(kie));

26       killJob(job);

27     } catch (Throwable t) {

28       String failureInfo =

29         "Job initialization failed:\n" + StringUtils.stringifyException(t);

30       // If the job initialization is failed, job state will be FAILED

31       LOG.error(failureInfo);

32       job.getStatus().setFailureInfo(failureInfo);

33       failJob(job);

34     }

35      }

　　首先是获取初始化前的状态prevStatus；然后是job.initTasks()初始化；在获取初始化的后的状态newStatus；

　　job.initTasks()方法代码比较多，主要的工作是检查之后获取输入数据的分片信息TaskSplitMetaInfo[] splits =
createSplits(jobId)这是去读的上传到HDFS中的文件job.splitmetainfo和job.split，要确保numMapTasks ==
splits.length，然后构建numMapTasks个TaskInProgress作为MapTask，

MapReduce job在JobTracker初始化源码级分析,布布扣,bubuko.com

时间： 2024-12-13 10:53:12

MapReduce job在JobTracker初始化源码级分析

MapReduce job在JobTracker初始化源码级分析的相关文章

监听器初始化Job、JobTracker相应TaskTracker心跳、调度器分配task源码级分析

TableInputFormat分片及分片数据读取源码级分析

Flume-NG内置计数器(监控)源码级分析

Flume-NG(1.5版本)中SpillableMemoryChannel源码级分析

Shell主要逻辑源码级分析(1)——SHELL运行流程

源码级强力分析Hadoop的RPC机制

MapReduce中TextInputFormat分片和读取分片数据源码级分析

JobTracker启动流程源码级分析

mapreduce job提交流程源码级分析（三）