CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model

1.  3 traditional ways computes run faster

  • Faster clocks
  • More work/clock cycle  
  • More processors

2. Parallelism

  • A high end Gpu contains over 3,000 arithmatic units,ALUs, that can simultanously run 3,000 arithmetic operations. GPU can have tens of thousands of parallel pieces of work all active at the same time.
  • A modern GPU may be runing up to 65,000 concurrent threads.

3. GPGPU--General purpose Programmability on the Graphics Grocessing Unit.

4.  How Are CPUs Getting Faster?

    More transistors avaliable for computation.

5. Why don`t we keep increasing clock speed?

  Runing a billion transistors generate an awful lot of heat,and we can`t keep all these processors cool.

6. What kind of processors are we building

  A: Why are traditional CPU-like processors not the most energy efficient processors?

  Q: Traditonal CPU-like processors rise up in flexibility and performance but expensive in terms of power.

  We might choose to build simpler control structures and instead devote those transistors to supporting more computation to the data path.The way that we`re going to build that data path in the GPU is by building a large number of parallel compute units. Individually, these compute units are small,simple,and power efficient.

7.  Build a power efficient processor


    • Minimizing Latency(execute time)
    • Throughput(tasks completed unit time, stuff/time, jobs/hour)  

       Notes:these two goals are not necessarily aligned.        

8. Latency vs Bandwidth

  Improved latency often leads to improved througput,and vise versa.But the GP designers are really prioritizing througput.

9. Core GPU design tents

  • Lots of simple compute units and trade simple control for more compute
  • Explicitly(显示) parallel programming model  
  • Optimize for througput,not latency

10. GPU from the point of view of the developer

  8 core Intel Ivy Bridge processor,has 8 cores,each core has 8-wide AVX vector operations,each core supports two simultaneously running threads.Multiply those together will get 128-way parallelism.

时间: 2024-12-24 22:18:05

CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model的相关文章

Intro to Parallel Programming课程笔记001

Intro to Parallel Programming How do you dig a hole faster? GPU理念 很多很多简单计算单元: 清洗的并行计算模型: 关注吞吐量而非延迟: CPU: HOST GPU:DEVICE A Typical GPU Program 1,CPUallocates(分配) storage on GPU  cuda Malloc 2,CPUcopies input data from CPU-GPU          cuda Memcpy 3,C

把书《CUDA By Example an Introduction to General Purpose GPU Programming》读薄

鉴于自己的毕设需要使用GPU CUDA这项技术,想找一本入门的教材,选择了Jason Sanders等所著的书<CUDA By Example an Introduction to General Purpose GPU Programming>.这本书作为入门教材,写的很不错.自己觉得从理解与记忆的角度的出发,书中很多内容都可以被省略掉,于是就有了这篇博文.此博文记录与总结此书的笔记和理解.注意本文并没有按照书中章节的顺序来写.书中第8章图像互操作性和第11章多GPU系统上的CUDA C,这

udacity android 学习笔记: lesson 4 part a

udacity android 学习笔记: lesson 4 part a 作者:干货店打杂的 /titer1 /Archimedes 出处: 联系:1307316一九六八 声明:本文采用以下协议进行授权: 自由转载-非商用-非衍生-保持署名|Creative Commons BY-NC-ND 3.0 ,转载请注明作者及出处. tips:

udacity android 学习笔记: lesson 4 part b

udacity android 学习笔记: lesson 4 part b 作者:干货店打杂的 /titer1 /Archimedes 出处: 联系:1307316一九六八 声明:本文採用下面协议进行授权: 自由转载-非商用-非衍生-保持署名|Creative Commons BY-NC-ND 3.0 ,转载请注明作者及出处. tips:

udacity android 实践笔记: lesson 4 part a

udacity android 实践笔记: lesson 4 part a 作者:干货店打杂的 /titer1 /Archimedes 出处: 联系:1307316一九六八(短信最佳) 声明:本文采用以下协议进行授权: 自由转载-非商用-非衍生-保持署名|Creative Commons BY-NC-ND 3.0 ,转载请注明作者及出处. tips:

udacity android 实践笔记: lesson 4 part b

udacity android 实践笔记: lesson 4 part b 作者:干货店打杂的 /titer1 /Archimedes 出处: 联系:1307316一九六八(短信最佳) 声明:本文采用以下协议进行授权: 自由转载-非商用-非衍生-保持署名|Creative Commons BY-NC-ND 3.0 ,转载请注明作者及出处. tips:

Head First HTML5 Programming笔记--chapter1 认识HTML5

升级到HTML5 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 //EN" ""> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <ti

Head First HTML5 Programming笔记--chapter2 介绍Javascript和DOM

你已经了解了HTML标记(也称为结构),而且知道了CSS样式(也称为表示),剩下的就是Javascript(也称为行为). JavaScript的工作方式 1. 编写 你创建HTML标记和JavaScript代码,并把它们放在文件中,比如说index.html和index.js(或者,也可以都放在HTML文件中). 2. 加载 浏览器获取并加载你的页面,从上到下解析它的内容.遇到JavaScript时,浏览器会解析代码,检查它的正确性,然后执行代码.浏览器还会建立HTML的一个内部模型,称为DO

论文笔记《Tracking Using Dynamic Programming for Appearance-Based Sign Language Recognition》

一.概述 这是我在做手势识别的时候,在解决手势画面提取的时候看的一篇paper,这里关键是使用了动态规划来作为跟踪算法,效果是可以比拟cameshift和kf的,但在occlusion,gaps或者离线tracking的时候做的很好. 二.算法步骤 step1:对于时间的t的frame如X_t的每个pixel(x,y),首先计算出一个score q(t,x,y),称为local score,这个后面会说,score function是由你自己来选择的,然后需要算出一个Q(t,x,y),也就是gl