Notes of Principles of Parallel Programming: Peril-L Notation

Content

1 syntax and semantic

2 example set

1 syntax and semantic

1.1 extending C

Peril-L notation stands on the shoulder of C.

1.2 parallel threads

forall(<intVar in (<index range specification>)){
    <body>
}

1.3 synchronized and coordination

(1) exclusive block

exclusive { <body> }

(2) barrier

近在forall中使用。

barrier

1.4 memory model

2 address spaces: global, local

全局内存存在并发读写、延迟(λ)。

全局-本地映射函数

(1) localize()

根据进程id,返回生成本地数据副本

(2) mysize(global, i)

返回进程Pi中数据的数量

(3) localToGlobal(localData, i, j)

返回Pi的局部数据中索引为i的数据相应的全局索引

1.5 synchronized memory

FE(full/empty)变量,必须是global数据,记法示例:

int t‘ = 0;
FE state transfer
state\operation read write
empty stall(blocking) => full
full => empty stall(blocking)

1.6 reduce and scan

Reduce(/)

Scan(\)

sample:

+/count   // count is an array, return the sum of count‘s element
min\items // item is an array, return the minimal of items‘ element

2 example set

时间: 2024-08-05 23:15:28

Notes of Principles of Parallel Programming: Peril-L Notation的相关文章

Notes of Principles of Parallel Programming - partial

0.1 TopicNotes of Lin C., Snyder L.. Principles of Parallel Programming. Beijing: China Machine Press. 2008. (1) Parallel Computer Architecture - done 2015/5/24(2) Parallel Abstraction(3) Scable Algorithm Techniques(4) PP Languages: Java(Thread), MPI

CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model

1.  3 traditional ways computes run faster Faster clocks More work/clock cycle More processors 2. Parallelism A high end Gpu contains over 3,000 arithmatic units,ALUs, that can simultanously run 3,000 arithmetic operations. GPU can have tens of thous

Intro to Parallel Programming课程笔记001

Intro to Parallel Programming How do you dig a hole faster? GPU理念 很多很多简单计算单元: 清洗的并行计算模型: 关注吞吐量而非延迟: CPU: HOST GPU:DEVICE A Typical GPU Program 1,CPUallocates(分配) storage on GPU  cuda Malloc 2,CPUcopies input data from CPU-GPU          cuda Memcpy 3,C

Samples for Parallel Programming with the .NET Framework

The .NET Framework 4 includes significant advancements for developers writing parallel and concurrent applications, including Parallel LINQ (PLINQ), the Task Parallel Library (TPL), new thread-safe collections, and a variety of new coordination and s

Concurrent and Parallel Programming

What's the difference between concurrency and parallelism? Explain it to a five year old. Concurrent = Two queues and one coffee machine. Parallel = Two queues and two coffee machines. Tagged:   concurrency      parallel      programming http://joear

UESTC_菲波拉契数制升级版 2015 UESTC Training for Dynamic Programming&lt;Problem L&gt;

L - 菲波拉契数制升级版 Time Limit: 3000/1000MS (Java/Others)     Memory Limit: 65535/65535KB (Java/Others) Submit Status 我们定义如下数列为菲波拉契数列: F(1)=1 F(2)=2 F(i)=F(i−1)+F(i−2)(i>=3) 给定任意一个数,我们可以把它表示成若干互不相同的菲波拉契数之和.比如13有三种表示法 13=13 13=5+8 13=2+3+8 现在给你一个数n,请输出把它表示成

&quot;Principles of Reactive Programming&quot; 之&lt;Actors are Distributed&gt; (1)

week7中的前两节课的标题是”Actors are Distributed",讲了很多Akka Cluster的内容,同时也很难理解. Roland Kuhn并没有讲太多Akka Cluster自身如何工作的细节,而是更关注于如何利用Akka Cluster来把Actor分布到不同的节点上,或许这么安排是因为Akka Cluster能讲的东西太多,而Coursera的课时不够.但是,从听众的角度来说,这节课只是初步明白了下Akka Cluster能干啥,但是想要自己用起来,特别是想了解其工作的

2019-2020 ACM-ICPC Latin American Regional Programming Contest L - Leverage MDT

#include<map> #include<queue> #include<cstdio> #include<cstring> #include<iostream> #include<algorithm> using namespace std; #define LL long long const int N=1010; int val[N][N]; int res[N][N]; char c[N][N]; int n,m; bo

【2014-11-23】Heterogeneous Parallel Programming &ndash; Section 1

Latency devices(CPU cores) Throughput devices(GPU cores) Use the best match for the job (heterogeneity in mobile SOC CPU: Latency Oriented Design Powerful ALU Reduced operation latency Large caches convert long latency memory accesses to short latenc