Mesos原理与代码分析(4) Mesos Master的启动之三

3. ModuleManager::load(flags.modules.get())如果有参数--modules或者--modules_dir=dirpath,则会将路径中的so文件load进来

?

代码中加载模块的代码如下

?

对应的命令行参数如下:

?

?

都可以写什么Module呢?

?

首先是Allocator

?

默认是内置的Hierarchical Dominant Resource Fairness allocator

?

要写一个自己的Allocator:

  • 通过--modules加载so
  • 通过--allocator参数指定

?

当然很少有人会重写这个Allocator,因为这是Mesos的精髓所在,但是可以对默认的Alllocator进行加强的。

?

参考文档http://mesos.apache.org/documentation/latest/allocation-module/

?

Mesos Allocation Modules

The logic that the Mesos master uses to determine which frameworks to make resource offers to is encapsulated in the master‘s?allocator module. The allocator is a pluggable component that organizations can use to implement their own sharing policy, e.g. fair-sharing, priority, etc., or tune the default hierarchical Dominant Resource Fairness algorithm (see?the DRF paper).

To use a custom allocator in Mesos, one must:

  • Implement?the?Allocator?interface as defined in?mesos/allocator/allocator.hpp,
  • Wrap?the allocator implementation in a module and load it in the Mesos master.

Writing a custom allocator

Allocator modules are implemented in C++, the same language in which Mesos is written. They must subclass the?Allocator?interface defined in?mesos/allocator/allocator.hpp. However, your implementation can be a C++ proxy, which delegates calls to an actual allocator written in a language of your choice.

The default allocator is?HierarchicalDRFAllocatorProcess, which lives in$MESOS_HOME/src/master/allocator/mesos/hierarchical.hpp. Like most Mesos components, it is actor-based, which means all interface methods are non-blocking and return immediately after putting the corresponding action into the actor‘s queue. If you would like to design your custom allocator in a similar manner, subclass?MesosAllocatorProcess?from$MESOS_HOME/src/master/allocator/mesos/allocator.hpp?and wrap your actor-based allocator inMesosAllocator. This dispatches calls to the underlying actor and controls its lifetime. You can refer to?HierarchicalDRFAllocatorProcess?as a starting place if you choose to write your own actor-based allocation module.

Additionally, the built-in hierarchical allocator can be extended without the need to reimplement the entirety of the allocation logic. This is possible through the use of the?Sorter?abstraction. Sorters define the order in which hierarchy layers (e.g. roles or frameworks) should be offered resources by taking "client" objects and some information about those clients and returning an ordered list of clients.

Sorters are implemented in C++ and inherit the?Sorter?class defined in$MESOS_HOME/src/master/allocator/sorter/sorter.hpp. The default sorter is?DRFSorter, which implements fair sharing and can be found in$MESOS_HOME/src/master/allocator/sorter/drf/sorter.hpp. This sorter is capable of expressing priorities by specifying weights in?Sorter::add(). Each client‘s share is divided by its weight. For example, a role that has a weight of 2 will be offered twice as many resources as a role with weight 1.

Wiring up a custom allocator

Once a custom allocator has been written, the next step is to override the built-in implementation with your own. This process consists of several steps:

  • Wrap your allocator in a Mesos allocator module,
  • Load this module in Mesos master.

An allocator module is a factory function and a module description, as defined inmesos/module/allocator.hpp. Assuming the allocation logic is implemented by theExternalAllocator?class declared in?external_allocator.hpp, the following snippet describes the implementation of an allocator module named?ExternalAllocatorModule:

#include <mesos/allocator/allocator.hpp>
#include <mesos/module/allocator.hpp>
#include <stout/try.hpp>

?

#include "external_allocator.hpp"

?

using namespace mesos;
using mesos::allocator::Allocator;
using mesos::internal::master::allocator::HierarchicalDRFAllocator;

?

static Allocator* createExternalAllocator(const Parameters& parameters)
{
  Try<Allocator*> allocator = ExternalAllocator::create();
  if (allocator.isError()) {
    return nullptr;
  }

?

  return allocator.get();
}

?

// Declares an ExternalAllocator module named ‘ExternalAllocatorModule‘.
mesos::modules::Module<Allocator> ExternalAllocatorModule(
    MESOS_MODULE_API_VERSION,
    MESOS_VERSION,
    "Mesos Contributor",
    "[email protected]",
    "External Allocator module.",
    nullptr,
    createExternalAllocator);

Refer to the?Mesos Modules documentation?for instructions on how to compile and load a module in Mesos master.

?

Hook

?

你可以写hook模块,讲代码插在很多关键的步骤,从而改写整个Executor或者Docker或者Task的启动的整个过程。

?

可以干预的hook的地方定义在mesos/hook.hpp中。

?

Class hook定义如下:

?

?

其中比较常用的是slavePrelaunchDockerHook,可以在Docker启动之前做一些事情,比如准备工作。

还有slaveRemoveExecutorHook,这个可以在executor结束的时候,做一些事情,比如清理工作。

?

要加载一个hook需要:

  • 通过--modules加载so
  • 通过--hooks加载hook

?

./bin/mesos-agent.sh --master=<IP>:<PORT> --modules="file://<path-to-modules-config>.json" --hooks=TestTaskHook

?

Isolator

?

可通过--isolation=VALUE指定,仅仅用于Mesos Containerizer。

在src/slave/containerizer/mesos/container.cpp里面

?

?

默认只会有cpu和memory

?


if (flags.isolation == "cgroups") {

LOG(WARNING) << "The ‘cgroups‘ isolation flag is deprecated, "

<< "please update your flags to"

<< " ‘--isolation=cgroups/cpu,cgroups/mem‘.";

?

flags_.isolation = "cgroups/cpu,cgroups/mem";

}

?

然后是创建Mesos Containerizer的Launcher

?


// Create the launcher for the MesosContainerizer.

Try<Launcher*> launcher = [&flags_]() -> Try<Launcher*> {

#ifdef __linux__

if (flags_.launcher.isSome()) {

// If the user has specified the launcher, use it.

if (flags_.launcher.get() == "linux") {

return LinuxLauncher::create(flags_);

} else if (flags_.launcher.get() == "posix") {

return PosixLauncher::create(flags_);

} else {

return Error(

"Unknown or unsupported launcher: " + flags_.launcher.get());

}

}

?

// Use Linux launcher if it is available, POSIX otherwise.

return LinuxLauncher::available()

? LinuxLauncher::create(flags_)

: PosixLauncher::create(flags_);

?

……

return PosixLauncher::create(flags_); //默认是Posix

?

然后就是创建Isolator

?


// Create the isolators.

//

// Currently, the order of the entries in the --isolation flag

// specifies the ordering of the isolators. Specifically, the

// `create` and `prepare` calls for each isolator are run serially

// in the order in which they appear in the --isolation flag, while

// the `cleanup` call is serialized in reverse order.

//

// It is the responsibility of each isolator to check its

// dependency requirements (if any) during its `create`

// execution. This means that if the operator specifies the

// flags in the wrong order, it will produce an error during

// isolator creation.

//

// NOTE: We ignore the placement of the filesystem isolator in

// the --isolation flag and place it at the front of the isolator

// list. This is a temporary hack until isolators are able to

// express and validate their ordering requirements.

?

const hashmap<string, lambda::function<Try<Isolator*>(const Flags&)>>

creators = {

// Filesystem isolators.

#ifndef __WINDOWS__

{"filesystem/posix", &PosixFilesystemIsolatorProcess::create},

#else

{"filesystem/windows", &WindowsFilesystemIsolatorProcess::create},

#endif // __WINDOWS__

#ifdef __linux__

{"filesystem/linux", &LinuxFilesystemIsolatorProcess::create},

?

// TODO(jieyu): Deprecate this in favor of using filesystem/linux.

{"filesystem/shared", &SharedFilesystemIsolatorProcess::create},

#endif // __linux__

?

// Runtime isolators.

#ifndef __WINDOWS__

{"posix/cpu", &PosixCpuIsolatorProcess::create},

{"posix/mem", &PosixMemIsolatorProcess::create},

?

// "posix/disk" is deprecated in favor of the name "disk/du".

{"posix/disk", &PosixDiskIsolatorProcess::create},

{"disk/du", &PosixDiskIsolatorProcess::create},

?

#if ENABLE_XFS_DISK_ISOLATOR

{"disk/xfs", &XfsDiskIsolatorProcess::create},

#endif

#else

{"windows/cpu", &WindowsCpuIsolatorProcess::create},

#endif // __WINDOWS__

#ifdef __linux__

{"cgroups/cpu", &CgroupsCpushareIsolatorProcess::create},

{"cgroups/devices", &CgroupsDevicesIsolatorProcess::create},

{"cgroups/mem", &CgroupsMemIsolatorProcess::create},

{"cgroups/net_cls", &CgroupsNetClsIsolatorProcess::create},

{"cgroups/perf_event", &CgroupsPerfEventIsolatorProcess::create},

{"docker/runtime", &DockerRuntimeIsolatorProcess::create},

{"docker/volume", &DockerVolumeIsolatorProcess::create},

?

{"gpu/nvidia",

[&nvidia] (const Flags& flags) -> Try<Isolator*> {

if (!nvml::isAvailable()) {

return Error("Cannot create the Nvidia GPU isolator:"

" NVML is not available");

}

?

CHECK_SOME(nvidia)

<< "Nvidia components should be set when NVML is available";

?

return NvidiaGpuIsolatorProcess::create(flags, nvidia.get());

}},

?

{"namespaces/pid", &NamespacesPidIsolatorProcess::create},

{"network/cni", &NetworkCniIsolatorProcess::create},

#endif // __linux__

// NOTE: Network isolation is currently not supported on Windows builds.

#if !defined(__WINDOWS__) && defined(WITH_NETWORK_ISOLATOR)

{"network/port_mapping", &PortMappingIsolatorProcess::create},

#endif

};

?

所有的isolator会放在一个vector里面

?


vector<Owned<Isolator>> isolators;

?

foreach (const string& isolation, isolations) {

Try<Isolator*> isolator = [&]() -> Try<Isolator*> {

if (creators.contains(isolation)) {

return creators.at(isolation)(flags_);

} else if (ModuleManager::contains<Isolator>(isolation)) {

return ModuleManager::create<Isolator>(isolation);

}

return Error("Unknown or unsupported isolator");

}();

?

if (isolator.isError()) {

return Error("Failed to create isolator ‘" + isolation + "‘: " +

isolator.error());

}

?

// NOTE: The filesystem isolator must be the first isolator used

// so that the runtime isolators can have a consistent view on the

// prepared filesystem (e.g., any volume mounts are performed).

if (strings::contains(isolation, "filesystem/")) {

isolators.insert(isolators.begin(), Owned<Isolator>(isolator.get()));

} else {

isolators.push_back(Owned<Isolator>(isolator.get()));

}

}

?

最终创建MesosContainerizer

?


return new MesosContainerizer(

flags_,

local,

fetcher,

Owned<ContainerLogger>(logger.get()),

Owned<Launcher>(launcher.get()),

provisioner.get(),

isolators);

?

那isolator是如何起作用的呢?

?

当一个MesosContainerizer启动一个进程的时候,会调用如下的函数。

?

在src/slave/containerizer/mesos/containerizer.cpp中有

?

里面最重要的一步是fork一个进程

?


Try<pid_t> forked = launcher->fork(

containerId,

path::join(flags.launcher_dir, MESOS_CONTAINERIZER),

argv,

Subprocess::FD(STDIN_FILENO),

(local ? Subprocess::FD(STDOUT_FILENO)

: Subprocess::IO(subprocessInfo.out)),

(local ? Subprocess::FD(STDERR_FILENO)

: Subprocess::IO(subprocessInfo.err)),

launchFlags,

environment,

namespaces); // ‘namespaces‘ will be ignored by PosixLauncher.

?

最后还会调用isolate不断的检验这个进程是否使用适当的资源。

?


return isolate(containerId, pid)

.then(defer(self(),

&Self::fetch,

containerId,

executorInfo.command(),

directory,

user,

slaveId))

.then(defer(self(), &Self::exec, containerId, pipes[1]))

.onAny([pipes]() { os::close(pipes[0]); })

.onAny([pipes]() { os::close(pipes[1]); });

?

Isolator定义了以下函数

?

?

?

中会调用

?

Master Contender and Detector

Detector是检测哪个是Mesos-Master的Leader,Contender是自己竞争成为Mesos-Master的leader。默认使用zookeeper。

?

4. Mesos-Master启动的第四步是加载Hook, HookManager::initialize(flags.hooks.get())

时间: 2024-08-28 10:31:34

Mesos原理与代码分析(4) Mesos Master的启动之三的相关文章

Mesos原理与代码分析(2): Mesos Master的启动之一

Mesos Master的启动参数如下: /usr/sbin/mesos-master --zk=zk://127.0.0.1:2181/mesos --port=5050 --log_dir=/var/log/mesos --hostname=192.168.56.101 --hostname_lookup=false --ip=192.168.56.101 --quorum=1 --registry=replicated_log --work_dir=/var/lib/mesos/maste

Mesos原理与代码分析(5): Mesos Master的启动之四

? 5. Create an instance of allocator. ? 代码如下 ? Mesos源码中默认的Allocator,即HierarchicalDRFAllocator的位置在$MESOS_HOME/src/master/allocator/mesos/hierarchical.hpp,而DRF中对每个Framework排序的Sorter位于$MESOS_HOME/src/master/allocator/sorter/drf/sorter.cpp,可以查看其源码了解它的工作原

Mesos原理与代码分析(3): Mesos Master的启动之二

2. process::firewall::install(move(rules));如果有参数--firewall_rules则会添加规则 ? 对应的代码如下: // Initialize firewall rules. if (flags.firewall_rules.isSome()) { vector<Owned<FirewallRule>> rules; ? const Firewall firewall = flags.firewall_rules.get(); ? i

免费的Lucene 原理与代码分析完整版下载

Lucene是一个基于Java的高效的全文检索库.那么什么是全文检索,为什么需要全文检索?目前人们生活中出现的数据总的来说分为两类:结构化数据和非结构化数据.很容易理解,结构化数据是有固定格式和结构的或者有限长度的数据,比如数据库,元数据等.非结构化数据则是不定长或者没有固定格式的数据,如图片,邮件,文档等.还有一种较少的分类为半结构化数据,如XML,HTML等,在一定程度上我们可以将其按照结构化数据来处理,也可以抽取纯文本按照非结构化数据来处理.非结构化数据又称为全文数据.,对其搜索主要有两种

mesos支持gpu代码分析以及capos支持gpu实现

这篇文章涉及mesos如何在原生的mesoscontainerizer和docker containerizer上支持gpu的,以及如果自己实现一个mesos之上的framework capos支持gpu调度的实现原理,(capos是hulu内部的资源调度平台 refer to https://www.cnblogs.com/yanghuahui/p/9304302.html). mesos slave在启动的时候需要初始化containerizer的resource,包含cpu/mem/gpu

WordPress HOOK机制原理及代码分析

WordPress强大的插件机制让我们可以自由扩展功能.网上对插件的使用以及开发方法都有大量资料可以查询. 今天我们就分析一下四个主要函数的代码,包括: add_action.do_action.add_filter.apply_action. 一.add_action和add_filter 为什么把这两个函数放在一起讲呢?其实我们看看add_action函数的定义(图一)就可以知道,其实跟add_filter是同一个函数,执行的是相同的操作.只是把action和filter区分开,让开发者能更

Openvswitch原理与代码分析(6):用户态流表flow table的操作

当内核无法查找到流表项的时候,则会通过upcall来调用用户态ovs-vswtichd中的flow table. 会调用ofproto-dpif-upcall.c中的udpif_upcall_handler函数. static void * udpif_upcall_handler(void *arg) { ????struct handler *handler = arg; ????struct udpif *udpif = handler->udpif; ? ????while (!latc

Openvswitch原理与代码分析(1):总体架构

一.Opevswitch总体架构 Openvswitch的架构网上有如下的图表示: 每个模块都有不同的功能 ovs-vswitchd 为主要模块,实现交换机的守护进程daemon 在Openvswitch所在的服务器进行ps aux可以看到以下的进程 root 1008 0.1 0.8 242948 31712 ? S<Ll Aug06 32:17 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err

Lucene原理与代码分析解读笔记

Lucene是一个基于Java的高效的全文检索库. 那么什么是全文检索,为什么需要全文检索? 目前人们生活中出现的数据总的来说分为两类:结构化数据和非结构化数据.很容易理解,结构化数据是有固定格式和结构的或者有限长度的数据,比如数据库,元数据等.非结构化数据则是不定长或者没有固定格式的数据,如图片,邮件,文档等.还有一种较少的分类为半结构化数据,如XML,HTML等,在一定程度上我们可以将其按照结构化数据来处理,也可以抽取纯文本按照非结构化数据来处理. 非结构化数据又称为全文数据.,对其搜索主要