翻译:使用tbb实现特征检测的例子

A feature-detection example using the Intel® Threading Building Blocks flow graph

By Michael V. (Intel), Added September 9, 2011

Translate
Chinese Simplified
Chinese Traditional
English
French
German
Italian
Portuguese
Russian
Spanish
Turkish

Translate

The Intel® Threading Building Blocks ( Intel® TBB )  flow graph is fully supported in Intel® TBB 4.0.  If you are unfamiliar with the flow graph, you can read an introduction here.

Figure 1 below shows a flow graph that implements a simple feature detection application. A number of images will enter the graph and two alternative feature detection algorithms will be applied to each one. If either algorithm detects a feature of interest, the image will be stored for later inspection. In this article, I’ll describe each node used in this graph, and then provide and described a complete working implementation.

Figure 1: The Intel® TBB flow graph for the feature-detection example.

In the figure, there are four different type of nodes used to construct the application: a source_node, a queue_node, two join_nodes, and several function_nodes. Before I provide a sample implementation, I’ll provide a brief overview of each node.

The first type of node is a source_node, which is shown pictorially using the symbol below. This type of node has no predecessors, and is used to generate messages that are injected into the graph. It executes a user functor (or lambda expression) to generate its output. The unfilled circle on its right side indicates that it buffers its output and that this buffer can be reserved. The source_node buffers a single item. When a buffer is reserved, a value is held for the caller until the caller either consumes or releases the value. A source_node will only invoke the user functor when there is nothing currently buffered in its single item output buffer.

The second type of node is a queue_node, which is show using the figure below. A queue_node is an unbounded first-in first-out buffer. Like the source_node, its output is reservable.

The third type of node, of which there are two variants used in the example, is the join_node. A join_node has multiple input ports and generates a single output tuple that contains a value received at each port. A join_node can use different policies at its input ports: queueingreserving or tag_matching. A queueing join_node, greedily consumes all messages as they arrive and generates an output whenever it has at least 1 item at each input queue. A reserving join_node only attempts to generate a tuple when it can successfully reserve an item at each input port. If it cannot successfully reserve all inputs, it releases all of its reservations and will only try again when it receives a message from the port or ports it was previously unable to reserve. Lastly, a tag_matching join_node uses hash tables to buffer messages in its input ports. When it has received messages at each port that have matching keys, it creates an output tuple with these messages. Shown below are the symbol for the reserving and tag_matching join_nodes used in Figure 1.

The final node type used in this example is a function_node; it uses the symbol shown below. A function_node executes a user-provided functor or lambda expression on incoming messages, passing the return value to its successors. A function_node can be constructed with a limited or unlimited allowable concurrency level. A function_node with unlimited concurrency creates a task to apply its functor to each message as they arrive. If a function_node has limited concurrency, it will create tasks only up to its allowed concurrency level, buffering messages at its input as necessary so that they are not dropped.

To save on space, I’m going to fake the image processing parts of this example. In particular, each image will simply be an array of characters. An image that contains the character ‘A’ has a feature recognizable by algorithm A, and an image that contains the character ‘B’ has a feature recognizable by algorithm B. So in the post, I will provide the complete code to construct and execute a flow graph that has the structure shown in Figure 1, but I’ll replace the actual computations with trivial ones.

Below is the declaration of struct image, as well as the trivial implementations that can be used as the bodies of the function nodes. The function get_next_image will be used by the source_node to generate images for processing. You might note that in get_next_image, every 11th image will have a feature detectable by algorithm A and every 13th image will contain a feature detectable by algorithm B. The function preprocess_image adds a simple offset to each character, and detect_with_A and detect_with_B do the trivial search for the characters ‘A‘ and ‘B‘, respectively.

#include <cstring>
#include <cstdio>

const int num_image_buffers = 100;
int image_size = 10000000;

struct image {
   const int N;
   char *data;
   image();
   image( int image_number, bool a, bool b );
};

image::image() : N(image_size) {
   data = new char[N];
}

image::image( int image_number, bool a, bool b ) : N(image_size) {
    data = new char[N];
    memset( data, ‘\0‘, N );
    data[0] = (char)image_number - 32;
    if ( a ) data[N-2] = ‘A‘;
    if ( b ) data[N-1] = ‘B‘;
}

int img_number = 0;
int num_images = 64;
const int a_frequency = 11;
const int b_frequency = 13;

image *get_next_image() {
    bool a = false, b = false;
    if ( img_number < num_images ) {
        if ( img_number%a_frequency == 0 ) a = true;
        if ( img_number%b_frequency == 0 ) b = true;
        return new image( img_number++, a, b );
    } else {
       return false;
    }
}

void preprocess_image( image *input_image, image *output_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        output_image->data[i] = input_image->data[i] + 32;
    }
}

bool detect_with_A( image *input_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == ‘a‘ )
            return true;
    }
    return false;
}

bool detect_with_B( image *input_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == ‘b‘ )
            return true;
    }
    return false;
}

void output_image( image *input_image, bool found_a, bool found_b ) {
    bool a = false, b = false;
    int a_i = -1, b_i = -1;
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == ‘a‘ ) { a = true; a_i = i; }
        if ( input_image->data[i] == ‘b‘ ) { b = true; b_i = i; }
    }
    printf("Detected feature (a,b)=(%d,%d)=(%d,%d) at (%d,%d) for image %p:%d\n",
a, b, found_a, found_b, a_i, b_i, input_image, input_image->data[0]);
}

The code to implement the flow graph itself is shown in function main below. I will interject text in the middle of the listing of main to describe the use of the flow graph components. If you want to build this example, you can just cut and paste the code snippets above and below linearly into a single file.

int num_graph_buffers = 8;

#include "tbb/flow_graph.h"

using namespace tbb;
using namespace tbb::flow;

int main() {

First, a graph g is created. All of the nodes will belong to this single graph. A few typedefs are provided to make it easier to refer to the outputs of the join nodes:

    graph g;

    typedef std::tuple< image *, image * > resource_tuple;
    typedef std::pair< image *, bool > detection_pair;
    typedef std::tuple< detection_pair, detection_pair > detection_tuple;

Next, the queue_node that holds the images buffers is created, along with the two join nodes. Again, note that the resource_join is using the reserving policy, while detection_join uses the tag_matchingpolicy. To use tag_matching, the user must provide functors that can extract the tag from the item; these appear as the additional arguments to the constructor.

    queue_node< image * > buffers( g );
    join_node< resource_tuple, reserving > resource_join( g );
    join_node< detection_tuple, tag_matching > detection_join( g,
[](const detection_pair &p) -> size_t { return (size_t)p.first; },
            [](const detection_pair &p) -> size_t { return (size_t)p.first; }  );

Next, the nodes that execute the user’s code are created, including the source_node and the four function_nodes. The user’s code is passed to each node using a C++ lambda expression ( a function object could also be used ). For the most part, each lambda expression is a bit of wrapper code that calls the functions that were described earlier, obtaining inputs and creating outputs as necessary. The make_edge calls wire together the nodes as shown in Figure 1.

    source_node< image * > src( g,
                                []( image* &next_image ) -> bool {
                                    next_image = get_next_image();
                                    if ( next_image ) return true;
                                    else return false;
                                }
                              );
    make_edge(src, input_port<0>(resource_join) );
    make_edge(buffers, input_port<1>(resource_join) );

    function_node< resource_tuple, image * >
        preprocess_function( g, unlimited,
                             []( const resource_tuple &in ) -> image * {
                                 image *input_image = std::get<0>(in);
                                 image *output_image = std::get<1>(in);
                                 preprocess_image( input_image, output_image );
                                 delete input_image;
                                 return output_image;
                             }
                           );

    make_edge(resource_join, preprocess_function );

    function_node< image *, detection_pair >
        detect_A( g, unlimited,
                 []( image *input_image ) -> detection_pair {
                    bool r = detect_with_A( input_image );
                    return std::make_pair( input_image, r );
                 }
               );

    function_node< image *, detection_pair >
        detect_B( g, unlimited,
                 []( image *input_image ) -> detection_pair {
                    bool r = detect_with_B( input_image );
                    return std::make_pair( input_image, r );
                 }
               );

    make_edge(preprocess_function, detect_A );
    make_edge(detect_A, input_port<0>(detection_join) );
    make_edge(preprocess_function, detect_B );
    make_edge(detect_B, input_port<1>(detection_join) );

    function_node< detection_tuple, image * >
        decide( g, serial,
                 []( const detection_tuple &t ) -> image * {
                     const detection_pair &a = std::get<0>(t);
                     const detection_pair &b = std::get<1>(t);
                     image *img = a.first;
                     if ( a.second || b.second ) {
                         output_image( img, a.second, b.second );
                     }
                     return img;
                 }
               );

    make_edge(detection_join, decide);
    make_edge(decide, buffers);

Because of the reserving join node at the front of the graph, the graph will remain idle until there are image buffers available in the buffers queue. The for-loop below allocates and puts buffers into the queue. After the loop, the call to g.wait_for_all() will block until the graph again becomes idle when all images are processed.

    // Put image buffers into the buffer queue
    for ( int i = 0; i < num_graph_buffers; ++i ) {
        image *img = new image;
        buffers.try_put( img );
    }
    g.wait_for_all();

When the graph is idle, all of the buffers will again be in the buffers queue. The queue_node therefore needs to be drained and the buffers deallocated.:

    for ( int i = 0; i < num_graph_buffers; ++i ) {
        image *img = NULL;
        if ( !buffers.try_get(img) )
            printf("ERROR: lost a buffer\n");
        else
            delete img;
    }
return 0;

I hope that this feature-detection example demonstrates how a reasonably complex flow graph that passes messages between nodes can be implemented. To learn more about the new features in Intel® Threading Building Blocks 4.0, visit http://www.threadingbuildingblocks.org or to learn more about the Intel® TBB flow graph, check-out the other blog articles at /en-us/blogs/tag/flow_graph/.

For more complete information about compiler optimizations, see our Optimization Notice.

Categories:

Tags:

时间: 2024-10-18 22:20:59

翻译:使用tbb实现特征检测的例子的相关文章

【分享】VNR翻译日语游戏汉化简易图解教材2

[分享]VNR翻译日语游戏汉化简易图解教材 http://867258173.diandian.com/post/2014-07-19/40062240755  请先看上面[PC,PSP,同人志翻译图解教材]和下载[J北京各版本],[J北京词典].[精灵虚拟光驱],[APP,NT,LE,,Ntleas四大转码工具]和[AlphaROM一键破解],7z之类相关的 请先看上面地址和下载,以后更新的教材和新的整合版将在这里补充.   使用OCR光学识别自动翻译游戏 添加[人工字幕]提高翻译质量   游

Qt 5.x 中文翻译缺失的一种解决办法

众所周知,Qt进入5.x时代后,Qt自带的翻译还停留在4.x时代,貌似中文的翻译文件完全没变,用MD5值对比都是一样的,导致加载了Qt的翻译后,某些自带控件上的文字仍然是英文.没办法,自食其力,丰衣足食,自己翻译吧. 为了补全(当然不是全部,只翻译自己用到了的)那些翻译,可以在代码中加入如下代码 static const char *Translation[] = {   QT_TRANSLATE_NOOP("QPlatformTheme", "OK"),   QT

Java系列笔记(1) - Java 类加载与初始化

目录 类加载器 动态加载 链接 初始化 示例 类加载器 在了解Java的机制之前,需要先了解类在JVM(Java虚拟机)中是如何加载的,这对后面理解java其它机制将有重要作用. 每个类编译后产生一个Class对象,存储在.class文件中,JVM使用类加载器(Class Loader)来加载类的字节码文件(.class),类加载器实质上是一条类加载器链,一般的,我们只会用到一个原生的类加载器,它只加载Java API等可信类,通常只是在本地磁盘中加载,这些类一般就够我们使用了.如果我们需要从远

STL

转至http://net.pku.edu.cn/~yhf/UsingSTL.htm STL概述 STL的一个重要特点是数据结构和算法的分离.尽管这是个简单的概念,但这种分离确实使得STL变得非常通用.例如,由于STL的sort()函数是完全通用的,你可以用它来操作几乎任何数据集合,包括链表,容器和数组. 要点 STL算法作为模板函数提供.为了和其他组件相区别,在本书中STL算法以后接一对圆括弧的方式表示,例如sort(). STL另一个重要特性是它不是面向对象的.为了具有足够通用性,STL主要依

Docker —— 用于统一开发和部署的轻量级 Linux 容器【转】

转自:http://www.oschina.net/translate/docker-lightweight-linux-containers-consistent-development-and-deployment 英文原文:Docker: Lightweight Linux Containers for Consistent Development and Deployment 使用Docker容器——轻量灵活的VM同类,来接管“依赖地狱”.学习Docker是如何基于LXC技术,通过把应用

三十分钟掌握STL

三十分钟掌握STL 这是本小人书.原名是<using stl>,不知道是谁写的.不过我倒觉得很有趣,所以化了两个晚上把它翻译出来.我没有对翻译出来的内容校验过.如果你没法在三十分钟内觉得有所收获,那么赶紧扔了它.文中我省略了很多东西.心疼那,浪费我两个晚上. 译者:kary contact:[email protected] STL概述 STL的一个重要特点是数据结构和算法的分离.尽管这是个简单的概念,但这种分离确实使得STL变得非常通用.例如,由于STL的sort()函数是完全通用的,你可以

MySQL字符编码的讨论:如何处理emoji等4字节的Unicode字符 - utf8mb4 vs. utf8 Collations

1. Unicode是什么 Unicode(中文:万国码.国际码.统一码.单一码)是计算机科学领域里的一项业界标准.它对世界上大部分的文字系统进行了整理.编码,使得电脑可以用更为简单的方式来呈现和处理文字. 简单说来,就是把世界上所有语言的字,加上所有能找到的符号(如高音谱号.麻将.emoji)用同一套编码表示出来. 2. UTF-8是什么 UTF-8(8-bit Unicode Transformation Format)是一种针对Unicode的可变长度字符编码.可变长度的意思在于,如果能使

nodejs api 中文文档

文档首页 英文版文档 本作品采用知识共享署名-非商业性使用 3.0 未本地化版本许可协议进行许可. Node.js v0.10.18 手册 & 文档 索引 | 在单一页面中浏览 | JSON格式 目录 关于本文档 稳定度 JSON 输出 概述 全局对象 global process console 类: Buffer require() require.resolve() require.cache require.extensions __filename __dirname module e

es6新增的api

从值的转换到数学计算,ES6给各种内建原生类型和对象增加了许多静态属性和方法来辅助这些常见任务.另外,一些原生类型的实例通过各种新的原型方法获得了新的能力. 注意: 大多数这些特性都可以被忠实地填补.我们不会在这里深入这样的细节,但是关于兼容标准的shim/填补,你可以看一下"ES6 Shim"(https://github.com/paulmillr/es6-shim/). Array 在JS中被各种用户库扩展得最多的特性之一就是数组类型.ES6在数组上增加许多静态的和原型(实例)的