Intel 并行性分析

Lab 2: Analyzing Parallelism


___________________________________________________________________

Developer Product Division

Disclaimer

The information contained in this document is provided for informational
purposes only and represents the current view of Intel Corporation ("Intel") and
its contributors ("Contributors") on, as of the date of publication. Intel and
the Contributors make no commitment to update the information contained in this
document, and Intel reserves the right to make changes at any time, without
notice.

DISCLAIMER. THIS DOCUMENT, IS PROVIDED "AS IS." NEITHER INTEL, NOR THE
CONTRIBUTORS MAKE ANY REPRESENTATIONS OF ANY KIND WITH RESPECT TO PRODUCTS
REFERENCED HEREIN, WHETHER SUCH PRODUCTS ARE THOSE OF INTEL, THE CONTRIBUTORS,
OR THIRD PARTIES. INTEL, AND ITS CONTRIBUTORS EXPRESSLY DISCLAIM ANY AND ALL
WARRANTIES, IMPLIED OR EXPRESS, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, NON-INFRINGEMENT, AND ANY
WARRANTY ARISING OUT OF THE INFORMATION CONTAINED HEREIN, INCLUDING WITHOUT
LIMITATION, ANY PRODUCTS, SPECIFICATIONS, OR OTHER MATERIALS REFERENCED HEREIN.
INTEL, AND ITS CONTRIBUTORS DO NOT WARRANT THAT THIS DOCUMENT IS FREE FROM
ERRORS, OR THAT ANY PRODUCTS OR OTHER TECHNOLOGY DEVELOPED IN CONFORMANCE WITH
THIS DOCUMENT WILL PERFORM IN THE INTENDED MANNER, OR WILL BE FREE FROM
INFRINGEMENT OF THIRD PARTY PROPRIETARY RIGHTS, AND INTEL, AND ITS CONTRIBUTORS
DISCLAIM ALL LIABILITY THEREFOR. INTEL, AND ITS CONTRIBUTORS DO NOT WARRANT THAT
ANY PRODUCT REFERENCED HEREIN OR ANY PRODUCT OR TECHNOLOGY DEVELOPED IN RELIANCE
UPON THIS DOCUMENT, IN WHOLE OR IN PART, WILL BE SUFFICIENT, ACCURATE, RELIABLE,
COMPLETE, FREE FROM DEFECTS OR SAFE FOR ITS INTENDED PURPOSE, AND HEREBY
DISCLAIM ALL LIABILITIES THEREFOR. ANY PERSON MAKING, USING OR SELLING SUCH
PRODUCT OR TECHNOLOGY DOES SO AT HIS OR HER OWN RISK.

Licenses may be
required. Intel, its contributors and others may have patents or pending patent
applications, trademarks, copyrights or other intellectual proprietary rights
covering subject matter contained or described in this document. No license,
express, implied, by estoppels or otherwise, to any intellectual property rights
of Intel or any other party is granted herein. It is your responsibility to seek
licenses for such intellectual property rights from Intel and others where
appropriate. Limited License Grant. Intel hereby grants you a limited copyright
license to copy this document for your use and internal distribution only. You
may not distribute this document externally, in whole or in part, to any other
person or entity. LIMITED LIABILITY. IN NO EVENT SHALL INTEL, OR ITS
CONTRIBUTORS HAVE ANY LIABILITY TO YOU OR TO ANY OTHER THIRD PARTY, FOR ANY LOST
PROFITS, LOST DATA, LOSS OF USE OR COSTS OF PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES, OR FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF YOUR USE OF THIS DOCUMENT OR RELIANCE UPON THE INFORMATION CONTAINED
HEREIN, UNDER ANY CAUSE OF ACTION OR THEORY OF LIABILITY, AND IRRESPECTIVE OF
WHETHER INTEL, OR ANY CONTRIBUTOR HAS ADVANCE NOTICE OF THE POSSIBILITY OF SUCH
DAMAGES. THESE LIMITATIONS SHALL APPLY NOTWITHSTANDING THE FAILURE OF THE
ESSENTIAL PURPOSE OF ANY LIMITED REMEDY.

Intel and Intel logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright
? 2010, Intel Corporation. All Rights Reserved.Table of Contents

Lab 2: Analyzing
Parallelism    i

Developer Product
Division    i

Disclaimer    ii

Lab 2: Finding
Parallelism Issues    1

Activity 1 – Build the
Application    2

Activity 2 – Collect Parallelism
Data    3

Activity 3 – Analyze the app‘s
Parallelism    4

Lab 2: Finding Parallelism Issues








Time
Required

Thirty
minutes

Objective

In
this lab session, you will use Intel? VTune? Amplifier XE to determine the
amount of parallelism in an application.

After
successfully completing this lab‘s activities, you will be able
to:

  • Collect
    parallelism performance data for an application

  • Determine
    the amount of parallelism in an application

Activity 1 – Build the Application











Time
Required

Ten
minutes

Objective


  • Build the
    application in preparation for finding its hotspot
   

  1. Using
    Microsoft Visual Studio, select File->Open and open the solution file:
    tachyon_vtune_amp_xe.sln

  2. Select/highlight
    the analyze_locks project

  3. From the
    top Visual Studio menu select Build->Build analyze_locks

  4. Verify
    at the bottom of the Visual Studio screen that it built with no errors

Review
Questions

  • Did the
    tachyon.common project build also?

  • What was
    the build configuration for the projects?

Activity 2 – Collect Parallelism Data











Time
Required

Ten
minutes

Objective


  • Run the
    application while collecting paralleism data

Codes
Description


  • Tachyon is a
    2-D raytracer/rendering program that displays an
    image

  1. Right-click on
    identify_concurrency in the Solution Explorer window and select "Set As
    Startup Project"

  2. Click on
    the "New Analysis" button

  3. Select
    "Algorithm Tuning->Parallelism" in the analysis type pane

  4. Click
    "Analyze" – The tachyon application will run. Note that as the application
    runs it draws and image of several different silver balls on the screen.
    Notice the execution time displayed in the applications title bar immediately
    after the image is completely displayed. You will need this execution time in
    Lab 3

  5. After
    the application completes the Intel? VTune? Amplifier XE will spend some time
    analyzing the data. When it is finished analyzing, the summary pane appears.
    Note the analysis explanation pane comes up. Read it and then clear the
    pane.

    At this point the application has run to completion and the
    Intel? VTune? Analyzer is ready to display the analyzed results.

Review
Questions

  • What is
    the result screen that appears after clearing the analysis explanation pane?

  • What
    useful data is in this first screen?

Analyze the app‘s Parallelism











Time
Required

Twenty
minutes

Objective


  • Analyze the
    amount of parallelism in the application both as an overall average and
    also as the application runs.

Codes
Description


  • Tachyon is a
    2-D raytracer/rendering program that displays an
    image

  1. Click
    the "Bottom-up" tab. Notice the timeline view at the bottom of the screen.
    It shows the multiple threads that executed as it ran. There are a number of
    interesting things to notice.

    There is a very large amount of thread
    transition time as indicated by the large amount of yellow color in the top
    thread graph. It looks like the threads are spending a lot of time
    transitioning locks between them.

  2. Zoom-in
    on a yellow portion of the graph by left-clicking and dragging over a 1 second
    portion of the it. A dialog box appears - select "Zoom in on selection". Now
    you can see that there is no time in which both worker threads executed at the
    same time. Something is causing these 2 threads to "take turns" executing and
    to not execute at the same time!

  3. Click on
    the "Undo Previous Zoom Selection" icon to get back to the original timeline
    view. Notice also that if you move the mouse pointer slowly over the Thread
    Concurrency graph at the bottom of the screen, the concurrency numbers are
    always around one. We are not getting any useful parallelism in this
    app.

    Notice also that there seems to be very little CPU usage or thread
    execution near the end of the program. This is the phase of the program in
    which it finished but kept the application windows visible so the user has
    time to see the overall execution time.

Review
Questions

  • Is this
    really a parallel application? Did it have more than 1 thread executing
    simultaneously at any time?

Intel 并行性分析,布布扣,bubuko.com

时间: 2024-10-13 18:19:51

Intel 并行性分析的相关文章

Nah Lock: 一个无锁的内存分配器

概述 我实现了两个完全无锁的内存分配器:_nalloc 和 nalloc.  我用benchmark工具对它们进行了一组综合性测试,并比较了它们的指标值. 与libc(glibc malloc)相比,第一个分配器测试结果很差,但是我从中学到了很多东西,然后我实现了第二个无锁分配器,随着核数增加至30,测试结果线性提高.核数增加至60,测试结果次线性提高,但是仅比tcmalloc好一点. 想要安装,输入命令: git clone ~apodolsk/repo/nalloc,阅读 README文档.

快速开发CUDA程序的方法

根据几年的CUDA开发经验,简单的介绍下CUDA程序的大概开发步骤,按照先修改CPU串行程序后移植到GPU平台的原理,把需要在GPU上做的工作尽量先在CPU平台上修改,降低了程序的开发难度,同时有利用bug的调试.通过实现一种快速.有效地CUDA并行程序开发的方法,提高CUDA并行程序开发效率,降低CUDA并行程序开发周期和难度. (1)    CPU串行程序分析 对于CPU串行程序,首先需要测试串行程序中的热点函数,以及分析热点函数的并行性: a)       热点测试 根据时间的测试结果确定

intel万兆网卡驱动简要分析

原创文章,转载请注明: 转载自pagefault 本文链接地址: intel万兆网卡驱动简要分析 这里分析的驱动代码是给予linux kernel 3.4.4 对应的文件在drivers/net/ethernet/intel 目录下,这个分析不涉及到很细节的地方,主要目的是理解下数据在协议栈和驱动之间是如何交互的. 首先我们知道网卡都是pci设备,因此这里每个网卡驱动其实就是一个pci驱动.并且intel这里是把好几个万兆网卡(82599/82598/x540)的驱动做在一起的. 首先我们来看对

【性能分析】使用Intel VTune Amplifier

本文转自 https://software.intel.com/zh-cn/blogs/2010/11/10/amplxe-cl/版权归原作者所有,如原作者有任何不允许转载之理由,本文将自行删除. Intel® VTune™ Amplifier XE 2011 是新一代的性能分析工具,含图形界面以方便分析结果.但有时我们基于二点原因需要使用命令行来收集性能数据和进行分析: 1.含图形界面的工具自身消耗系统的资源 2.用户需要做自动(回溯)性能收集和分析的工作(Run Script),及产生报告

使用 Intel GPA 与 分析3D程序和抓取模型

原文链接在这里 http://dev.cra0kalo.com/?p=213 背景信息 Intel的GPA本身是一款图形分析软件,并没有设计从3D程序里抓取模型资源的功能,但这里作者是通过hook GPA应用,让他可以把捕捉到的顶点缓冲和索引缓冲. Intel GPA 64位的下载链接 http://registrationcenter-download.intel.com/akdlm/irc_nas/4674/gpa_14.3_release_231370_windows_x64.msi In

Intel CPU 漏洞分析

Intel CPU漏洞分析报告 预备知识 存储分级 由于计算机存储分级的特性(第一级:寄存器,第二级:高速缓存,第三级:内存,第四级:磁盘),每一级之间的访问速度差距高达数量级.所以处理器会将用到的数据和指令加载进高速缓存(现代CPU分指令高速缓存与数据高速缓存),以提高计算机的执行速度.其加载数据或指令进高速缓存的原则是(空间局部性.时间局部性): 1. 时间局部性:如果一个数据被访问,那么在近期它很可能还会被再次访问. 2. 空间局部性: 与当前访问的数据紧挨着的数据,近期将会被访问 分支预

Intel SGX官方例程分析之SealedData

建议先阅读一下官方的开发手册:https://software.intel.com/en-us/sgx-sdk-dev-reference 以及知乎上面SGX板块的另外两篇例程的分析: SampleEnclave和PowerTransition : https://zhuanlan.zhihu.com/intelsgx 下面进入正题! 原文地址:https://www.cnblogs.com/FollowWinds/p/10109725.html

Intel台式机CPU性能对比分析(不定期更新)

Intel台式机CPU性能对比(综合) 双核-四核CPU(综合) 型号 主频/睿频 核心/线程 制程 功耗 三级Cache 核显 内存控制 i7-6700K 4.0/4.2GHz 4/8 14nm 95W 8MB HD 530 DDR4-2133 i7-6700 3.4/4.0GHz 4/8 14nm 65W 8MB HD 530 DDR4-2133 i7-6700T 2.8/3.6GHz 4/8 14nm 35W 8MB HD 530 DDR4-2133 i7-4790K 4.0/4.4GHz

【转帖】intel 2018年1 月2号爆出漏洞分析 知乎匿名用户

作者:匿名用户链接:https://www.zhihu.com/question/265012502/answer/288407097来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处. 首先要明确的是:1)这个漏洞不是去年说的Intel ME的漏洞:2)这个漏洞不是很多答主说的依靠时间推测内核加载地址的问题. 这是一个新爆出的漏洞,虽然看起来不是1月2号才暴露出来.因为Linux和Windows早在去年11月份左右就有动作开始修补了. 下面是科普时间: 首先我们需要知