D3D9 GPU Hacks (转载)

D3D9 GPU Hacks

I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and turns out there’s a lot of them!

If you know more hacks or more details, please let me know in the comments!

Most hacks are exposed as custom (“FOURCC”) formats. So to check for that, you do CheckDeviceFormat. Here’s the list (Usage column codes: DS=DepthStencil, RT=RenderTarget; Resource column codes: tex=texture, surf=surface). More green = more hardware support.

Format Usage Resource Description NVIDIA GeForce ATI Radeon Intel
Shadow mapping
D3DFMT_D16 DS tex Sample depth buffer directly as shadow map. 3+ HD 2xxx+ 965+
D3DFMT_D24X8 DS tex 3+ HD 2xxx+ 965+
Depth Buffer As Texture
DF16 DS tex Read depth buffer as texture.   9500+ G45+
DF24 DS tex   X1300+ SB+
INTZ DS tex 8+ HD 4xxx+ G45+
RAWZ DS tex 6 & 7    
Anti-Aliasing related
RESZ RT surf Resolve MSAA’d depth stencil surface into non-MSAA’d depth texture.   HD 4xxx+ G45+
ATOC 0 surf Transparency anti-aliasing. 7+   SB+
SSAA 0 surf 7+    
All ATI SM2.0+ hardware   9500+  
n/a     Coverage Sampled Anti-Aliasing[6] 8+    
Texturing
ATI1 0 tex ATI1n & ATI2n texture compression formats. 8+ X1300+ G45+
ATI2 0 tex 6+ 9500+ G45+
DF24 DS tex Fetch 4: when sampling 1 channel texture, return four touched texel values[1]. Check for DF24 support.   X1300+ SB+
Misc
NULL RT surf Dummy render target surface that does not consume video memory. 6+ HD 4xxx+ HD+
NVDB 0 surf Depth Bounds Test. 6+    
R2VB 0 surf Render into vertex buffer. 6 & 7 9500+  
INST 0 surf Geometry Instancing on pre-SM3.0 hardware.   9500+  

Native Shadow Mapping

Native support for shadow map sampling & filtering was introduced ages ago (GeForce 3) by NVIDIA. Turns out ATI also implemented the same feature for it’s DX10 level cards. Intel also supports it on Intel 965 (aka GMA X3100, the shader model 3 card) and later (G45/X4500/HD) cards.

The usage is quite simple; just create a texture with regular depth/stencil format and render into it. When reading from the texture, one extra component in texture coordinates will be the depth to compare with. Compared & filtered result will be returned.

Also useful:

  • Creating NULL color surface to keep D3D runtime happy and save on video memory.

Depth Buffer as Texture

For some rendering schemes (anything with “deferred”) or some effects (SSAO, depth of field, volumetric fog, …) having access to a depth buffer is needed. If native depth buffer can be read as a texture, this saves both memory and a rendering pass or extra output for MRTs.

Depending on hardware, this can be achieved via INTZ, RAWZ, DF16 or DF24 formats:

  • INTZ is for recent (DX10+) hardware. With recent drivers, all three major IHVs expose this. According to ATI [1], it also allows using stencil buffer while rendering. Also allows reading from depth texture while it’s still being used for depth testing (but not depth writing). Looks like this applies to NV & Intel parts as well.
  • RAWZ is for GeForce 6 & 7 series only. Depth is specially encoded into four channels of returned value.
  • DF16 and DF24 is for ATI and Intel cards, including older cards that don’t support INTZ. Unlike INTZ, this does not allow using depth buffer or using the surface for both sampling & depth testing at the same time.

Also useful when using depth textures:

  • Creating NULL color surface to keep D3D runtime happy and save on video memory.
  • RESZ allows resolving multisampled depth surfaces into non-multisampled depth textures (result will be sample zero for each pixel).

Caveats:

  • Using INTZ for both depth/stencil testing and sampling at the same time seems to have performance problems on ATI cards (checked Radeon HD 3xxx to 5xxx, Catalyst 9.10 to 10.5). A workaround is to render to INTZ depth/stencil first, then use RESZ to “blit” it into another surface. Then do sampling from one surface, and depth testing on another.

Depth Bounds Test

Direct equivalent of GL_EXT_depth_bounds_test OpenGL extension. See [3] for more information.

Transparency Anti-Aliasing

NVIDIA exposes two controls: transparency multisampling (ATOC) and transparency supersampling (SSAA) [5]. ATI says that all Radeons since 9500 support “alpha to coverage” [1]. Intel supports ATOC with SandyBridge (GMA HD 2000/3000) GPUs.

Render Into Vertex Buffer

Similar to “stream out” or “memexport” in other APIs/platforms. See [2] for more information. Apparently some NVIDIA GPUs (or drivers?) support this as well.

Geometry Instancing

Instancing is supported on all Shader Model 3.0 hardware by Direct3D 9.0c, so there’s no extra hacks necessary there. ATI has exposed a capability to enable instancing on their Shader Model 2.0 hardware as well. Check for “INST” support, and do dev->SetRenderState (D3DRS_POINTSIZE, kFourccINST); at startup to enable instancing.

I can’t find any document on instancing from AMD now. Other references: [7] and [8].

ATI1n & ATI2n Compressed Texture Formats

Compressed texture formats. ATI1n is known as BC4 format in DirectX 10 land; ATI2n as BC5 or 3Dc. Since they are just DX10 formats, support for this is quite widespread, with NVIDIA exposing it a while ago and Intel exposing it recently (drivers 15.17 or higher).

Thing to keep in mind: when DX9 allocates the mip chain, they check if the format is a known compressed format and allocate the appropriate space for the smallest mip levels. For example, a 1x1 DXT1 compressed level actually takes up 8 bytes, as the block size is fixed at 4x4 texels. This is true for all block compressed formats. Now when using the hacked formats DX9 doesn’t know it’s a block compression format and will only allocate the number of bytes the mip would have taken, if it weren’t compressed. For example a 1x1 ATI1n format will only have 1 byte allocated. What you need to do is to stop the mip chain before the size of the either dimension shrinks below the block dimensions otherwise you risk having memory corruption.

Another thing to keep in mind: on Vista+ (WDDM) driver model, textures in these formats will still consume application address space. Most regular textures like DXT5 don’t take up additional address space in WDDM (see here). For some reason ATI1n and ATI2n textures on D3D9 are deemed lockable.

References

All this information gathered mostly from:

  1. Advanced DX9 Capabilities for ATI Radeon Cards (pdf)
  2. ATI R2VB Programming (pdf)
  3. NVIDIA GPU Programming Guide (pdf)
  4. ATI Tesselation
  5. NVIDIA Transparency AA
  6. NVIDIA Coverage Sampled AA
  7. Humus’ Instancing Demo
  8. Arseny’s article on particles

Changelog

  • 2013 06 11: One more note on ATI1n/ATI2n format virtual address space issue (thanks JSeb!).
  • 2013 04 09: Turns out since sometime 2011 Intel has DF24 and Fetch4 for SandyBridge and later.
  • 2011 01 09: Intel implemented ATOC for SandyBridge, and NULL for GMA HD and later.
  • 2010 08 25: Intel implemented DF16, INTZ, RESZ for G45+ GPUs!
  • 2010 08 25: Added note on INTZ performance issue with ATI cards.
  • 2010 08 19: Intel implemented ATI1n/ATI2n support for G45+ GPUs in the latest drivers!
  • 2010 07 08: Added note on ATI1n/ATI2n texture formats, with a caveat pointed out by Henning Semler (thanks!)
  • 2010 01 06: Hey, shadow map hacks are also supported on Intel 965!
  • 2009 12 09: Shadow map hacks are supported on Intel G45!
  • 2009 11 21: Added instancing on SM2.0 hardware.
  • 2009 11 20: Added Fetch-4, CSAA.
  • 2009 11 20: Initial version.

原文链接:http://aras-p.info/texts/D3D9GPUHacks.html

时间: 2024-10-12 02:31:11

D3D9 GPU Hacks (转载)的相关文章

[Attila GPU] Attila OGL2/D3D9 GPU C Model Simulator

http://www.opengpu.org/forum.php?mod=viewthread&tid=1094&highlight=Attila 查看: 4979|回复: 14    [Attila GPU] Attila OGL2/D3D9 GPU C Model Simulator [复制链接]     ic.expert 管理员 注册时间 2007-7-11 积分 32646 串个门 加好友 打招呼 发消息 电梯直达 1#  发表于 2009-10-19 01:29:41 |只看该

D3D9 优化小技巧

此篇文章主要讲一些小技巧,针对前面转载的D3D9 GPU Hacks,我们可以做的一些优化. 在做延迟渲染或者其它需要深度的地方使用INTZ格式的纹理,这样可以直接对纹理进行操作,节省了显存和带宽,这样即使在前向渲染的时候也可以获取深度,有了深度信息我们就可以做很多效果,如水的柔边,水边泡沫,景深等效果. 注:以下示例代码均摘自http://developer.amd.com/wordpress/media/2012/10/Advanced-DX9-Capabilities-for-ATI-Ra

深度学习的黄金搭档:GPU正重塑计算方式(转载)

转载:原文链接 深度学习的黄金搭档:GPU正重塑计算方式 OFweek电子工程网讯 随着神经网络和深度学习研究的不断深入——尤其是语音识别和自然语言处理.图像与模式识别.文本和数据分析,以及其他复杂领域——研究者们不断在寻找新的更好的方法来延伸和扩展计算能力. 几十年来,这一领域的黄金标准一直是高性能计算(HCP)集群,它解决了大量处理能力的问题,虽然成本有点过高.但这种方法已经帮助推动了多个领域的进步,包括天气预测.金融服务,以及能源勘探. 然而,2012 年,一种新的方法出现了.伊利诺伊大学

【转载】GPU 加速下的图像处理

Instagram,Snapchat,Photoshop. 所有这些应用都是用来做图像处理的.图像处理可以简单到把一张照片转换为灰度图,也可以复杂到是分析一个视频,并在人群中找到某个特定的人.尽管这些应用非常的不同,但这些例子遵从同样的流程,都是从创造到渲染. 在电脑或者手机上做图像处理有很多方式,但是目前为止最高效的方法是有效地使用图形处理单元,或者叫 GPU.你的手机包含两个不同的处理单元,CPU 和 GPU.CPU 是个多面手,并且不得不处理所有的事情,而 GPU 则可以集中来处理好一件事

GPU渲染管线与可编程着色器

本文由@浅墨_毛星云 出品,转载请注明出处.   文章链接:http://blog.csdn.net/poem_qianmo/article/details/71978861 这篇文章是解析计算机图形学界"九阴真经总纲"一般存在的<Real-Time Rendering 3rd>系列文章的第三篇.将带来RTR3第三章内容"Chapter 3 The Graphics Processing Unit 图形处理器"的总结.概括与提炼. 这章的主要内容是介绍G

Python著名的lib和开发框架(均为转载)

第一,https://github.com/vinta/awesome-python Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Inspired by awesome-php. Awesome Python Admin Panels Algorithms and Design Patterns Anti-spam Asset Management A

【转载】深入理解Direct3D9

原文:Effulgent的<深入理解Direct3D9>整理版(转) 深入理解Direct3D9 深入理解D3D9对图形程序员来说意义重大,我把以前的一些学习笔记都汇总起来,希望对朋友们有些所帮助,因为是零散笔记,思路很杂,还请包涵. 其实只要你能完美理解D3DLOCK.D3DUSAGE.D3DPOOL.LOST DEVICE.QUERY.Present().BeginScene().EndScene()等概念,就算是理解D3D9了, 不知道大家有没有同感.有如下几个问题,如果你能圆满回答就算

转载:深度学习caffe代码怎么读

原文地址:https://www.zhihu.com/question/27982282 Gein Chen的回答 Many thanks —————————————————————————————————————————— 1.学习程序的第一步,先让程序跑起来,看看结果,这样就会有直观的感受.Caffe的官网上Caffe | Deep Learning Framework 提供了很多的examples,你可以很容易地开始训练一些已有的经典模型,如LeNet.我建议先从 LeNet MNIST

转载:2013计算机视觉代码合集

转载,原文地址http://blog.csdn.net/daoqinglin/article/details/23607079 -------------------------------------------------------------------------- 来源: http://www.yuanyong.org/cv/cv-code-one.html http://www.yuanyong.org/cv/cv-code-two.html http://www.yuanyong