对pathtracing的一些个人理解

本人水平有限,若有错误也请指正~

上面说到pathtracing(pt)的一些优点和缺点,优点即其实现很简单,这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现,但是pathtracing的最大缺点就是收敛速度很慢,其原因就在于全局光照的那个积分式要求在半球面上连续采样,这就需要每次发射n条采样光线,n的数量直接决定了最终图像的质量(对于pt来讲它的图像质量取决于噪点数量的多少,噪点多的话有点像用老式胶片拍出来的照片那样),一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点,于是有了多种方法来降噪,具体来讲这些降噪手段有以下几种:

1)改进pt算法。如后续比较经典的bidirectional pathtracing(bdpt),metropolis pathtracing

2)保持pt算法不变,但对其内部的采样机制进行改进。比如:

  a. 根据BRDF函数进行重要性采样;

  b. 采样时所用到的均匀分布,一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲,并不利于最终图像的收敛,

    现在的很多渲染器内部都是利用一种叫“低差异序列(Low Discrepancy Sequence)”的采样方式(详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics)。

  c. pt算法的一大优势就是屏幕每个像素的采样都是独立的,每次迭代也是互相独立的,这就可以利用并行算法来对算法进行加速。CPU端有OpenMP,而且在诸多流行编译器内都直接内置集成了,

    GPU端,N卡有CUDA,A卡可以利用OpenCL,而且渲染速度与GPU数量成线性增长关系,这就有点像暴力解算的意思,GPU端实现的最大区别就是不能递归(虽然N卡早已支持递归,但是

    在计算规模很大的情况下每个线程的栈就很浅,再加上本身pt算法的采样光线有可能采样到很深的深度,所以并不是很敢用),递归主要发生在光线与场景求交的过程中,由于对场景采用的

    是基于层级的划分(Bounding Volume Hierarchy)本身就有递归的意思。不能递归这点只要自己构造合理数据结构,把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可,

    可以比较容易的将递归降级为循环。

    (ps:也难怪现在很多步进式渲染器都采用pathtracing...实现简单,开发简单,而且要提速的话直接借助并行算法,只不过多配几个GPU就行了,这就节省了一堆人力,成本就低...)

而最近浏览网页也发现了,很多人和企业也倾向于利用CPU渲染,具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus

里面有个人的回答也是不错:

I‘m one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).

Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:

  • GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can‘t load that into GPU memory, it‘s literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren‘t good at that.)
  • Don‘t believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It‘s only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it‘s not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
  • If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don‘t throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
  • Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that‘s also a sunk cost investment you don‘t just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
  • Amdahl‘s Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don‘t help with it. Let‘s say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.

But what‘s different about games? Why are GPUs good for games but not film?

First of all, when you make a game, remember that it‘s got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.

Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what‘s delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn‘t look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.

With a game, you don‘t really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can‘t and shouldn‘t try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.

时间: 2024-10-13 23:32:41

对pathtracing的一些个人理解的相关文章

Python——深入理解urllib、urllib2及requests(requests不建议使用?)

深入理解urllib.urllib2及requests            python Python 是一种面向对象.解释型计算机程序设计语言,由Guido van Rossum于1989年底发明,第一个公开发行版发行于1991年,Python 源代码同样遵循 GPL(GNU General Public License)协议[1] .Python语法简洁而清晰,具有丰富和强大的类库. urllib and urllib2 区别 urllib和urllib2模块都做与请求URL相关的操作,但

关于SVM数学细节逻辑的个人理解(三) :SMO算法理解

第三部分:SMO算法的个人理解 接下来的这部分我觉得是最难理解的?而且计算也是最难得,就是SMO算法. SMO算法就是帮助我们求解: s.t.   这个优化问题的. 虽然这个优化问题只剩下了α这一个变量,但是别忘了α是一个向量,有m个αi等着我们去优化,所以还是很麻烦,所以大神提出了SMO算法来解决这个优化问题. 关于SMO最好的资料还是论文<Sequential Minimal Optimization A Fast Algorithm for Training Support Vector

2.2 logistic回归损失函数(非常重要,深入理解)

上一节当中,为了能够训练logistic回归模型的参数w和b,需要定义一个成本函数 使用logistic回归训练的成本函数 为了让模型通过学习来调整参数,要给出一个含有m和训练样本的训练集 很自然的,希望通过训练集找到参数w和b,来得到自己得输出 对训练集当中的值进行预测,将他写成y^(I)我们希望他会接近于训练集当中的y^(i)的数值 现在来看一下损失函数或者叫做误差函数 他们可以用来衡量算法的运行情况 可以定义损失函数为y^和y的差,或者他们差的平方的一半,结果表明你可能这样做,但是实际当中

理解信息管理系统

1.信息与数据的区别是什么? 数据是记录客观事物,可鉴别的符号,而信息是具有关联性和目的性的结构化,组织化的数据.数据经过处理仍是数据,而信息经过加工可以形成知识.处理数据是为了便于更好的解释,只有经过解释,数据才有意义,才可以成为信息.可以说信息是经过加工以后,对客观世界产生影响的数据. 2.信息与知识的区别是什么? 信息是具有关联性和目的性的结构化,组织化的数据,知识是对信息的进一步加工和应用,是对事物内在规律和原理的认识.信息经过加工可以形成知识. 3.举一个同一主题不同级别的数据.信息.

深度理解div+css布局嵌套盒子

1. 网页布局概述 网页布局的概念是把即将出现在网页中的所有元素进行定位,而CSS网页排版技术有别于传统的网页排版方法,它将页面首先在整体上使用<div>标记进行分块,然后对每个快进行CSS定位以及设置显示效果,最后在每个块中添加相应的内容.利用CSS排版方法更容易地控制页面每个元素的效果,更新也更容易,甚至页面的拓扑结构也可以通过修改相应的CSS属性来重新定位.  2. 盒子模型 盒子模型是CSS控制页面元素的一个重要概念,只有掌握了盒子模型,才能让CSS很好地控制页面上每一个元素,达到我们

深入理解Java:类加载机制及反射

一.Java类加载机制 1.概述 Class文件由类装载器装载后,在JVM中将形成一份描述Class结构的元信息对象,通过该元信息对象可以获知Class的结构信息:如构造函数,属性和方法等,Java允许用户借由这个Class相关的元信息对象间接调用Class对象的功能. 虚拟机把描述类的数据从class文件加载到内存,并对数据进行校验,转换解析和初始化,最终形成可以被虚拟机直接使用的Java类型,这就是虚拟机的类加载机制. 2.工作机制 类装载器就是寻找类的字节码文件,并构造出类在JVM内部表示

八幅漫画理解使用 JSON Web Token 设计单点登录系统

原文出处: John Wu 上次在<JSON Web Token – 在Web应用间安全地传递信息>中我提到了JSON Web Token可以用来设计单点登录系统.我尝试用八幅漫画先让大家理解如何设计正常的用户认证系统,然后再延伸到单点登录系统. 如果还没有阅读<JSON Web Token – 在Web应用间安全地传递信息>,我强烈建议你花十分钟阅读它,理解JWT的生成过程和原理. 用户认证八步走 所谓用户认证(Authentication),就是让用户登录,并且在接下来的一段时

谈谈你对Hibernate的理解

答: 1. 面向对象设计的软件内部运行过程可以理解成就是在不断创建各种新对象.建立对象之间的关系,调用对象的方法来改变各个对象的状态和对象消亡的过程,不管程序运行的过程和操作怎么样,本质上都是要得到一个结果,程序上一个时刻和下一个时刻的运行结果的差异就表现在内存中的对象状态发生了变化. 2.为了在关机和内存空间不够的状况下,保持程序的运行状态,需要将内存中的对象状态保存到持久化设备和从持久化设备中恢复出对象的状态,通常都是保存到关系数据库来保存大量对象信息.从Java程序的运行功能上来讲,保存对

IOS contentOffset该如何理解

contentOffset是哪个点??? 首先从字面理解:内容偏移 我可是查了词典的!!! 对于contentOffset有的时候我们会产生错误理解. 我不想在这里介绍错误的理解避免不必要的混淆. 我们什么时候会遇到contentOffset??? 更多的时候是在滚动试图中,我们可以通过它结合代理实现一些方法. 比如轮播(我会在下一篇介绍轮播的实现原理至于代码如果需要我可整理出来) 那么请让我通过图片演示来解释contentOffset. (一) 图一中绿色为屏幕也就是最大的显示范围也是滚动视图