Impala源码分析---1

2、Impala源码分析

参考链接：http://www.sizeofvoid.net/wp-content/uploads/ImpalaIntroduction2.pdf

本章开始进入源码分析阶段，参考链接是一篇非常好的impala实现、运行流程介绍的文档，感谢作者。

2.1 Impala内部架构

Impala内部架构图如下：

图2-1 Impala内部架构

从图中可以看出，Impala三个部分：client、Impalad、StateStore的关系。

组件	说明
Client	图中可以看到有三种，是Thrift客户端，用来提交查询，连接到Impalad的21000端口
Impalad	有frontEnd和backEnd两部分，包括三个Thrift Server（beeswax-server、hs2-server、be-server）
StateStore	各个impalad向其注册，然后它向各个impalad更新集群中其他节点的状态

下面介绍一下Impalad组件的各个端口，如下表：

属性	值	说明
	*Impalad组件端口*
Impala 后台程序后端端口 be_port	22000 默认值	ImpalaBackendService 导出的端口。
Impala Daemon Beeswax 端口 beeswax_port	21000 默认值	Impala Daemon 向 Beeswax 客户端请求提供服务所使用的端口。
Impala Daemon HiveServer2 端口 hs2_port	21050 默认值	Impala Daemon 向 HiveServer2 客户端请求提供服务所使用的端口。
StateStoreSubscriber 服务端口 state_store_subscriber_port	23000 默认值	StateStoreSubscriberService 运行的端口。

	*StateStore组件端口*
StateStore 服务端口 state_store_port	24000 默认值	StateStoreService 导出的端口。
StateStore HTTP 服务器端口 webserver_port	25010 默认值	StateStore 调试网站服务器运行的端口。

其中beeswax_port=21000是用来给Beeswax客户端提供服务的端口，比如图中的Hue客户端、JDBC、Impala-shell三种client都会使用这个端口；hs2_port=21050是用来给HiveServer2客户端提供服务的；be_port=22000是用来与内部的其他Impalad进程交互的端口；state_store_subscriber_port=23000是用来向StateStated进程注册自己和更新状态用的端口；而StateStore组件里的24000端口正是用来与Impalad的23000端口进行交互的，其他端口不太重要，不做介绍。

整体的代码文件结构如下：

2.2 Impalad代码分析

2.2.1 Impalad-main.cc

   16 // This file contains the main() function for the impala daemon process,
   17 // which exports the Thrift services ImpalaService and ImpalaInternalService.
   18
   19 #include <unistd.h>
   20 #include <jni.h>
   21
   22 #include "common/logging.h"
   23 #include "common/init.h"
   24 #include "exec/hbase-table-scanner.h"
   25 #include "exec/hbase-table-writer.h"
   26 #include "runtime/hbase-table-factory.h"
   27 #include "codegen/llvm-codegen.h"
   28 #include "common/status.h"
   29 #include "runtime/coordinator.h"
   30 #include "runtime/exec-env.h"
   31 #include "util/jni-util.h"
   32 #include "util/network-util.h"
   33 #include "rpc/thrift-util.h"
   34 #include "rpc/thrift-server.h"
   35 #include "rpc/rpc-trace.h"
   36 #include "service/impala-server.h"
   37 #include "service/fe-support.h"
   38 #include "gen-cpp/ImpalaService.h"
   39 #include "gen-cpp/ImpalaInternalService.h"
   40 #include "util/impalad-metrics.h"
   41 #include "util/thread.h"
   42
   43 using namespace impala;
   44 using namespace std;
   45
   46 DECLARE_string(classpath);
   47 DECLARE_bool(use_statestore);
   48 DECLARE_int32(beeswax_port);
   49 DECLARE_int32(hs2_port);
   50 DECLARE_int32(be_port);
   51 DECLARE_string(principal);
   52
   53 int main(int argc, char** argv) {
   54   InitCommonRuntime(argc, argv, true);    //参数解析，开启日志，基于Google gflags和glog
   55
   56   LlvmCodeGen::InitializeLlvm();
   57   JniUtil::InitLibhdfs();      //初始化JNI，因为Fe部分是java开发的
   58   EXIT_IF_ERROR(HBaseTableScanner::Init());
   59   EXIT_IF_ERROR(HBaseTableFactory::Init());
   60   EXIT_IF_ERROR(HBaseTableWriter::InitJNI());
   61   InitFeSupport();
   62
   63   // start backend service for the coordinator on be_port
   64   ExecEnv exec_env; 	//ExecEnv是query/paln-fragment的执行环境
   65   StartThreadInstrumentation(exec_env.metrics(), exec_env.webserver());
   66   InitRpcEventTracing(exec_env.webserver());
   67
   68   ThriftServer* beeswax_server = NULL;
   69   ThriftServer* hs2_server = NULL;
   70   ThriftServer* be_server = NULL;     //这是三个ThriftServer，原来服务client和其他impalad backend
   71   ImpalaServer* server = NULL;     //此server将上面三个ThriftServer包装起来对外提供服务
   72  EXIT_IF_ERROR(CreateImpalaServer(&exec_env, FLAGS_beeswax_port, FLAGS_hs2_port,
   73       FLAGS_be_port, &beeswax_server, &hs2_server, &be_server, &server));    //创建ImpalaServer
   74
   75   EXIT_IF_ERROR(be_server->Start());      //启动be_server
   76
   77   Status status = exec_env.StartServices();   //启动service,包括statestore_subscriber (用来向statestod进程注册)
   78   if (!status.ok()) {
   79     LOG(ERROR) << "Impalad services did not start correctly, exiting.  Error: "
   80                << status.GetErrorMsg();
   81     ShutdownLogging();
   82     exit(1);
   83   }
   84
   85   // this blocks until the beeswax and hs2 servers terminate
   86   EXIT_IF_ERROR(beeswax_server->Start());
   87   EXIT_IF_ERROR(hs2_server->Start());
   88   ImpaladMetrics::IMPALA_SERVER_READY->Update(true);
   89   LOG(INFO) << "Impala has started.";
   90   beeswax_server->Join();       //阻塞等待beeswax-server退出才执行后面的语句
   91   hs2_server->Join();         //阻塞等待hs2-server退出才继续执行后面语句
   92
   93   delete be_server;
   94   delete beeswax_server;
   95   delete hs2_server;
   96 }

待续。。。

时间： 2024-10-02 22:51:20

Impala源码分析---1

2、Impala源码分析

2.1 Impala内部架构

2.2 Impalad代码分析

2.2.1 Impalad-main.cc

Impala源码分析---1的相关文章

Impala 源码分析-FE

Impala源码之资源管理与资源隔离

TeamTalk源码分析之login_server

Android触摸屏事件派发机制详解与源码分析二(ViewGroup篇)

HashMap与TreeMap源码分析

Linux内核源码分析--内核启动之(5)Image内核启动(rest_init函数)（Linux-3.0 ARMv7）【转】

Spark的Master和Worker集群启动的源码分析

Solr4.8.0源码分析(22)之 SolrCloud的Recovery策略(三)

zg手册之 python2.7.7源码分析（4）-- pyc字节码文件