tensorflow节点布放（device assignment of node）算法：simpler_placer

tensorflow v0.9中目前在用的devcie assignment算法是simple placer算法，相比于白皮书中cost model算法实现简单。simpler placer算法优先选择/gpu:0设备，但不支持 multi gpu assignment。

白皮书提到的cost model可以根据设备资源代价、数据传输代价平衡分配设备，在v0.9版本中有部分实现，但还未开放使用，见 core/graph/costmodel.cc

simple_placer的实现代码在文件python/core/common_runtime/simple_placer.cc，其中包含device_assignment的核心功能。

core/common_runtime/simple_placer_test.cc测试片段如下

 1 ////////////////////////////////////////////////////////////////////////////////
 2 //
 3 // A SimplePlacerTest method has three phases:
 4 //
 5 // 1. Build a TensorFlow graph, with no (or partial) device assignments.
 6 // 2. Attempt to compute a placement using the SimplePlacer.
 7 // 3. EITHER: test that the constraints implied by the graph are respected;
 8 //    or that an appropriate error was reported.
 9 //
10 ////////////////////////////////////////////////////////////////////////////////
11 class SimplePlacerTest : public ::testing::Test {
12  protected:
13   SimplePlacerTest() {
14     // Build a set of 10 GPU and 10 CPU devices.
15     // NOTE: this->local_devices_ owns the device objects;
16     // this->devices_ contains borrowed pointers to the device
17     // objects.
18     for (int i = 0; i < 10; ++i) {    // 添加了10 cpu和10 gpu的fake devices
19       local_devices_.emplace_back(FakeDevice::MakeCPU(
20           strings::StrCat("/job:a/replica:0/task:0/cpu:", i)));
21       devices_.AddDevice(local_devices_.back().get());
22       // Insert the GPUs in reverse order.
23       local_devices_.emplace_back(FakeDevice::MakeGPU(
24           strings::StrCat("/job:a/replica:0/task:0/gpu:", 9 - i)));
25       devices_.AddDevice(local_devices_.back().get());
26     }
27   }
28   ...
29 }
30 ...
31 // Test that a graph with no constraints will successfully assign nodes to the
32 // "best available" device (i.e. prefer GPU over CPU).
33 TEST_F(SimplePlacerTest, TestNoConstraints) {
34   Graph g(OpRegistry::Global());
35   {  // Scope for temporary variables used to construct g.   // 用GraphDefBuilder构建graph的结构
36     GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
37     Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
38     ops::UnaryOp("TestRelu", ops::NodeOut(input, 0), b.opts().WithName("n1"));
39     ops::UnaryOp("TestRelu", ops::NodeOut(input, 1), b.opts().WithName("n2"));
40     TF_EXPECT_OK(BuildGraph(b, &g));   //  BuildGraph函数将GraphDefBuilder的图写入到Graph中
41   }
42
43   TF_EXPECT_OK(Place(&g));   // Place函数将graph中的node布放到设备列表中
44   EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);   // 期望：input节点在CPU中，n1节点在GPU中，n2节点在GPU中，故而GPU优先级大于CPU
45   EXPECT_DEVICE_TYPE(g, "n1", DEVICE_GPU);
46   EXPECT_DEVICE_TYPE(g, "n2", DEVICE_GPU);
47 }

其中BuildGraph函数将GraphDefBuilder 对象中的graph 结构定义写入到Graph中。Place函数将graph中的node布放到设备列表中，其中device assignment算法的核心在SimplePlacer::Run函数中

 1  // Builds the given graph, and (if successful) indexes the node
 2   // names for use in placement, and later lookup.
 3   Status BuildGraph(const GraphDefBuilder& builder, Graph* out_graph) {
 4     TF_RETURN_IF_ERROR(builder.ToGraph(out_graph));
 5     nodes_by_name_.clear();
 6     for (Node* node : out_graph->nodes()) {
 7       nodes_by_name_[node->name()] = node->id();
 8     }
 9     return Status::OK();
10   }
11   // Invokes the SimplePlacer on "graph". If no DeviceSet is specified, the
12   // placement will use the default DeviceSet (of 10 CPU and 10 GPU devices).
13   //
14   // REQUIRES: "*graph" was produced by the most recent call to BuildGraph.
15   Status Place(Graph* graph, DeviceSet* devices, SessionOptions* options) {
16     SimplePlacer placer(graph, devices, options);
17     return placer.Run();
18   }

SimplePlacer::Run()在core/common_runtime/simple_placer.cc文件中，具体实现分为4个步骤：

步骤1和2：遍历graph的node，将node加入到ColocationGraph对象中（不包含source和sink节点）。

 1 // 1. First add all of the nodes. Note that steps (1) and (2)
 2 // requires two passes over the nodes because the graph (and hence
 3 // the constraints) may not be acyclic.  这里graph可能是有环的？
 4 for (Node* node : graph_->nodes()) {
 5     // Skip the source and sink nodes.
 6     if (!node->IsOp()) { continue; }
 7     status = colocation_graph.AddNode(*node);
 8     if (!status.ok()) return AttachDef(status, node->def());
 9   }
10 // 2. Enumerate the constraint edges, and use them to update the disjoint node set.         // disjoint set（并查集，即不相交的节点集合），一种树型数据结构，
11 ...

1 ColocationGraph maintains the connected components of a colocation constraint graph, and uses this information to assign a satisfying device placement to the nodes of the graph.
2 The implementation uses the union- find algorithm to maintain the connected components efficiently and incrementally as edges (implied by ColocationGraph::ColocateNodes() invocations) are added.
3 参考：并查集wiki

步骤3：如下图和code所示，source和sink节点分配在cpu上，已指定device的节点不再重新分配。分配方式有方面，见Heuristic A和Heuristic B。

 1  3. For each node, assign a device based on the constraints in thedisjoint node set.
 2   std::vector<Device*> devices;
 3   std::vector<Node*> second_pass;
 4   for (Node* node : graph_->nodes()) {
 5     // Skip the source and sink nodes.
 6     if (!node->IsOp()) {
 7       continue;
 8     }
 9     // Skip nodes that already have an assigned name.
10     if (!node->assigned_device_name().empty()) {
11       continue;
12     }
13     // Heuristic A: prefer to place "generators" with their only
14     // consumers.
15     //
16     // If this is a node with no inputs and a single (non-ref)
17     // consumer, we save this for a second pass, so that the
18     // consumer‘s placement is chosen.
19     if (IsGeneratorNode(node)) {    // generator node: no input, one output, not a reference-type node
20       second_pass.push_back(node);
21       continue;
22     }
23     status = colocation_graph.GetDevicesForNode(node, &devices);
24     ...
25     // Returns the first device in sorted devices list so we will always
26     // choose the same device.
27     //
28     // TODO(vrv): Factor this assignment out into a pluggable
29     // algorithm, so that SimplePlacer is responsible for enforcing
30     // preconditions and we can experiment with other algorithms when
31     // given a choice of devices. Once we have a better idea of the
32     // types of heuristics we want to use and the information needed
33     // to perform good placement we can add an interface for this.
34     string assigned_device = devices[0]->name();
35     // Heuristic B: If the node only operates on metadata, not data,
36     // then it is desirable to place that metadata node with its
37     // input.
38     if (IsMetadataNode(node)) {
39       // Make sure that the input device type is in the list of supported
40       // device types for this node.
41       const Node* input = (*node->in_edges().begin())->src();
42       // TODO(vrv): if the input is empty, consider postponing this
43       // node‘s assignment to the second pass, so that we handle the
44       // case where a metadata node‘s input comes from a backedge
45       // of a loop.
46       const string& input_device_name = input->assigned_device_name();
47       if (CanAssignToDevice(input_device_name, devices)) {
48         assigned_device = input_device_name;
49       }
50     }
51     AssignAndLog(assigned_device, node);   // 将assigned_device分配个node节点，在步骤3中没有对符合Heuristic A的GeneratorNode分配设备，而是在步骤4中完成的
52   }

1 bool IsGeneratorNode(const Node* node) {
2   return node->num_inputs() == 0 && node->num_outputs() == 1 && node->out_edges().size() == 1 && !IsRefType(node->output_type(0));
3 }

1 bool IsMetadataNode(const Node* node) {
2   const string& node_type = node->type_string();
3   return (node_type == "Size" || node_type == "Shape" || node_type == "Rank");
4 }

步骤4：给步骤3中的Generator Node分配device。

// 4. Perform a second pass assignment for those nodes explicitly skipped during the first pass.
...

部分参考：

http://bettercstomorrow.com/2016/07/14/distributed-tensorflow-internal-architecture-summary/

http://bettercstomorrow.com/2016/07/06/distributed-tensorflow-internal-architecture-6/ （韩文的-_-）

”tensorflow: large-scale machine learning on heterogeneous distributed systems“

来自为知笔记(Wiz)

时间： 2024-12-14 02:57:53

tensorflow节点布放（device assignment of node）算法：simpler_placer的相关文章

使用虚拟节点改进的一致性哈希算法

分布式存储中的应用在分布式存储系统中,将数据分布至多个节点的方式之一是使用哈希算法.假设初始节点数为 N,则传统的对 N 取模的映射方式存在一个问题在于:当节点增删,即 N 值变化时,整个哈希表(Hash Table)需要重新映射,这便意味着大部分数据需要在节点之间移动. 因此现在普遍使用的是被称为一致性哈希(Consistent Hashing)的一类算法."一致性" 这个定语的意义在于:当增删节点时,只影响到与变动节点相邻的一个或两个节点,散列表的其他部分与原来保持一致.某种程度

实验四：Tensorflow实现了四个对抗图像制作算法--readme

文章来源:Github Four adversarial image crafting algorithms are implemented with Tensorflow. The four attacking algorithms can be found in attacks folder. The implementation adheres to the principle tensor-in, tensor-out. They all return a Tensorflow oper

Tensorflow快速入门1--实现K-Means聚类算法

快速入门1–实现K-Means聚类算法环境: 虚拟机版本:0.12.0(仅使用cpu下,pip命令安装) 目录 1.环境搭建的安装 1.2简单测试学习文档相关的库Seaborn.pandas安装实现K-Means聚类算法 2.1最基本的K-Means聚类算法步骤实现K-Means聚类算法 2.3测试数据准备 2.4完整的kmeans.py文件 2.5简单测试结果 1.环境搭建的安装这里通过pip安装(只安装cpu单机版的,有条件的可以安装gpu下的). 1 2 1 2 注意:如

TensorFlow实现knn（k近邻）算法

首先先介绍一下knn的基本原理: KNN是通过计算不同特征值之间的距离进行分类. 整体的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别. K通常是不大于20的整数.KNN算法中,所选择的邻居都是已经正确分类的对象.该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别. KNN算法要解决的核心问题是K值选择,它会直接影响分类结果. 如果选择较大的K值,就相当于用较大领域中的训练实例进行预测,其优点是

hdu 5289 Assignment 【ST算法】

题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=5289 题意:求满足最大值减最小值小于k的区间的数目. 枚举左端点,二分右端点,用st算法求区间最值代码: #include <stdio.h> #include <ctime> #include <math.h> #include <limits.h> #include <complex> #include <string> #incl

【LeetCode-面试算法经典-Java实现】【019-Remove Nth Node From End of List（移除单链表的倒数第N个节点）】

[019-Remove Nth Node From End of List(移除单链表的倒数第N个节点)] [LeetCode-面试算法经典-Java实现][所有题目目录索引] 原题 Given a linked list, remove the nth node from the end of list and return its head. For example, Given linked list: 1->2->3->4->5, and n = 2. After remo

经典算法学习——打印两个链表的第一个公共节点

求链表的公共节点是一道很经典的算法题,并不是很难.我们需要知道的是,一旦两个链表有一个公共节点的话,那么这两个链表的形状就是一个"Y"型.也就是说,自公共节点之后的所有节点都是一样的.如下: 其实只要看了这幅图,实现就很简单了.首先我们分别遍历两个链表,分别得出他们的长度L1,L2.然后在查找公共节点时,先在长的那个链表中走|L1-L2|步,然后两个链表同时向后进行同步遍历,每走一步时,就判断后面那个节点是否相同.若相同,则找到该第一个公共节点.完整代码上传至 https://gith

笔试算法题（24）：找出出现次数超过一半的元素 & 二叉树最近公共父节点

出题:数组中有一个数字出现的次数超过了数组长度的一半,请找出这个数字: 分析: 解法1:首先对数组进行排序,时间复杂度为O(NlogN),由于有一个数字出现次数超过了数组的一半,所以如果二分数组的话,划分元素肯定就是这个数字: 解法2:首先创建1/2数组大小的Hash Table(哈希表可以替代排序时间,由于一个数字出现超过了数组的一半,所以不同元素个数肯定不大于数组的一半),空间复杂度O(N),顺序扫描映射数组元素到Hash Table中并计数,最后顺序扫描Hash Table,计数超过数组

寻找节点d＝n的节点算法

这里的算法是毕设过程中,自己想到的,也不知道有不有人提出过.这里就记录下发现的过程的具体的算法,以后会用到背景描述毕设做的是「社交网络中病毒传播的预测」,前期过程主要是模拟几个网络的数据,然后从一个节点开始传播,研究传播过程的预测性. 其中一步,需要研究距离桥节点(两个网络的俩连接点)距离为n的节点为病毒源的传播过程.这里产生一个需求:寻找与节点A距离为n的节点们. 毕设过程基本是重复博士生姐姐已发表的论文,不过由于年轻气盛给导师说要自己做,只是博士生姐姐提供主线思路.每次都是自己做出来后,