分析和通译“Vulkan in 30 minutes”文

本文主要对“Vulkan in 30 minutes”简单分析与通译,有什么错误请指正。

I‘ve written this post with a specific target audience in mind, namely those who have a good grounding in existing APIs (e.g. D3D11 and GL) and understand the concepts of multithreading, staging resources, synchronisation and so on but want to know specifically
how they are implemented in Vulkan. So we end up with a whirlwind tour of what the main Vulkan concepts look like.

This isn‘t intended to be comprehensive (for that you should read the spec or a more in-depth tutorial), nor is it heavy in background or justification. Hopefully by the end of this you should be able to read specs or headers and have a sketched idea of
how a simple Vulkan application is implemented, but you will need to do additional reading.

Mostly, this is the document I wish had already been written when I first encountered Vulkan - so for the most part it is tuned to what I would have wanted to know. I‘ll reference the spec whenever you should do more reading to get a precise understanding,
but you‘ll at least know what to look for.

本文定位:

1) 本文针对熟悉目前存在的APIs (e.g. D3D11 和 GL)的,想了解Vulkan基本执行过程的读者,方便读者对Vulkan基本概念的认知。

2)如果想深入理解Vulkan,请阅读spec或者学习指南。

General

At the end of the post I‘ve included a heavily abbreviated pseudocode program showing the rough steps to a hello world triangle, to match up to the explanations.

A few simple things that don‘t fit any of the other sections:

  • Vulkan is a C API, i.e. free function entry points. This is the same as GL.
  • The API is quite heavily typed - unlike GL. Each enum is separate, handles that are returned are opaque 64-bit handles so they are typed on 64-bit (not typed on 32-bit, although you can make them typed if you use C++).
  • A lot of functions (most, even) take extensible structures as parameters instead of basic types.
  • VkAllocationCallbacks * is passed into creation/destruction functions that
    lets you pass custom malloc/free functions for CPU memory. For more details read the spec, in simple applications you can just pass NULL and
    let the implementation do its own CPU-side allocation.

Warning: I‘m not considering any error handling, nor do I talk much about querying for implementation limits and respecting them. While I‘m not intentionally getting anything outright wrong, I am skipping over many details
that a real application needs to respect. This post is just to get a grasp of the API, it‘s not a tutorial!

本文通俗介绍:

1)本文结尾附上hello world Triangle vulkan用例,匹配以下的步骤。

2) 本文不介绍一些专业术语,具体可以参考spec或者学习指南。

First steps

You initialise Vulkan by creating an instance (VkInstance). The instance is an entirely
isolated silo of Vulkan - instances do not know about each other in any way. At this point you specify some simple information including which layers and extensions you want to activate - there are query functions that let you enumerate what layers and extensions
are available.

With a VkInstance, you can now examine the GPUs available. A given Vulkan implementation
might not be running on a GPU, but let‘s keep things simple. Each GPU gives you a handle - VkPhysicalDevice.
You can query the GPUs names, properties, capabilities, etc. For example see vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures.

With a VkPhysicalDevice, you can create a VkDevice.
The VkDevice is your main handle and it represents a logical connection - i.e. ‘I am
running Vulkan on this GPU‘. VkDevice is used for pretty much everything else. This is
the equivalent of a GL context or D3D11 device.

N.B. Each of these is a 1:many relationship. A VkInstance can
have many VkPhysicalDevices, a VkPhysicalDevice can
have many VkDevices. In Vulkan 1.0, there is no cross-GPU activity, but you can bet this
will come in the future though.

I‘m hand waving some book-keeping details, Vulkan in general is quite lengthy in setup due to its explicit nature and this is a summary not an implementation guide. The overall picture is that your initialisation mostly looks like vkCreateInstance() → vkEnumeratePhysicalDevices() → vkCreateDevice().
For a quick and dirty hello world triangle program, you can do just that and pick the first physical device, then come back to it once you want error reporting & validation, enabling optional device features, etc.

步骤一

1)VkInstance, 一个vulkan实例,首先要创建该实例;VkInstances彼此之间不相关的,因此可以赋予不同的属性,例如:layers
and extensions。

2)在拥有VkInstance后,检测GPU是否可用(vulkan不一定运行在GPU,CPU上也是可以的)VkPhysicalDevice,通过vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures。

         3)在拥有VkPhysicalDevice之后,创建vulkan
deivce, 学过GL都是知道,GL是太台状态机,有一个管理状态context,VkDevices类似之。

 
         在本文结尾处vulkan用例伪代码,这对“步骤一”,列举如下:

const char *extensionNames[] = { "VK_KHR_surface", "VK_KHR_win32_surface" };

  // future structs will not be detailed, but this one is for illustration.
  // Application info is optional (you can specify application/engine name and version)
  // Note we activate the WSI instance extensions, provided by the ICD to
  // allow us to create a surface (win32 is an example, there's also xcb/xlib/etc)
  VkInstanceCreateInfo instanceCreateInfo = {
    VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType;
    NULL,                                   // const void* pNext;

    0,                                      // VkInstanceCreateFlags flags;

    NULL,                                   // const VkApplicationInfo* pApplicationInfo;

    0,                                      // uint32_t enabledLayerNameCount;
    NULL,                                   // const char* const* ppEnabledLayerNames;

    2,                                      // uint32_t enabledExtensionNameCount;
    extensionNames,                         // const char* const* ppEnabledExtensionNames;
  };

  VkInstance inst;
  vkCreateInstance(&instanceCreateInfo, NULL, &inst);

  // The enumeration pattern SHOULD be to call with last parameter NULL to
  // get the count, then call again to get the handles. For brevity, omitted
  VkPhysicalDevice phys[4]; uint32_t physCount = 4;
  vkEnumeratePhysicalDevices(inst, &physCount, phys);

  VkDeviceCreateInfo deviceCreateInfo = {
    // I said I was going to start omitting things!
  };

  VkDevice dev;
  vkCreateDevice(phys[0], &deviceCreateInfo, NULL, &dev);


Images and Buffers

Now that we have a VkDevice we
can start creating pretty much every other resource type (a few have further dependencies on other objects), for example VkImage andVkBuffer.

For GL people, one kind of new concept is that you must declare at creation time how an image will be used. You
provide a bit field, with each bit indicating a certain type of usage - color attachment, or sampled image in shader, or image load/store, etc.

You also specify the tiling for the image - LINEAR or OPTIMAL.
This specifies the tiling/swizzling layout for the image data in memory. OPTIMAL tiled
images are opaquely tiled,LINEAR are laid out just as you expect. This affects whether
the image data is directly readable/writable, as well as format support - drivers report image support in terms of ‘what image types are supported in OPTIMAL tiling,
and what image types are supported in LINEAR‘. Be prepared for very limited LINEAR support.

Buffers are similar and more straightforward, you give them a size and a usage and that‘s about it.

Images aren‘t used directly, so you will have to create a VkImageView -
this is familiar to D3D11 people. Unlike GL texture views, image views are mandatory but are the same idea - a description of what array slices or mip levels are visible to wherever the image view is used, and optionally a different (but compatible) format
(like aliasing a UNORMtexture as UINT).

Buffers are usually used directly as they‘re just a block of memory, but if you want to use them as a texel buffer in a shader, you need to provide a VkBufferView.

Images & Buffers

1).Image在Vulkan中代表所有具有像素结构的数组,可以用于表示文理,RenderTarget等等。和其他组件一样,Image也需要在创建的时候指定使用它的模式,例如Vulkan里有参数指定Image的内存Layout,可以是Linear,也可以是TiledLinear便于纹理Filter。如果把一个Linearlayout的Image当做纹理使用,在某些平台上可能导致严重的性能损失。类似传统的API,纹理本身并不直接绑定给Pipeline。需要读取和使用Image则要依赖于ImageView。

2) Buffer是所有我们所熟悉的VertexBuffer, Index Buffer, UniformBuffer等等的统称。而且一个Buffer的用途非常多样。在Vulkan中需要特别注意Buffer是从什么类型的内存中分配的,有的类型CPU可以访问,有的则不行。有的类型会在CPU上被缓存。现在这些内存的类型是重要的功能属性,不再只是对驱动的一个提示了

Allocating GPU Memory

Those buffers and images can‘t be used immediately after creation as no memory has been allocated for them. This step is up to you.

Available memory is exposed to applications by the vkGetPhysicalDeviceMemoryProperties().
It reports one or more memory heaps of given sizes, and one or more memory types with given properties. Each memory type comes from one heap - so a typical example for a discrete GPU on a PC would be two heaps -
one for system RAM, and one for GPU RAM, and multiple memory types from each.

The memory types have different properties. Some will be CPU visible or not, coherent between GPU and CPU access, cached or uncached, etc. You can
find out all of these properties by querying from the physical device. This allows you to choose the memory type you want. E.g. staging resources will need to be in host visible memory, but your images you render to will want to be in device local memory for
optimal use. However there is an additional restriction on memory selection that we‘ll get to in the next section.

To allocate memory you call vkAllocateMemory() which
requires your VkDevice handle and a description structure. The structure dictates which
type of memory to allocate from which heap and how much to allocate, and returns a VkDeviceMemory handle.

Host visible memory can be mapped for update - vkMapMemory()/vkUnmapMemory() are
familiar functions. All maps are by definition persistent, and as long as you synchronise it‘s legal to have memory mapped while in use by the GPU.

GL people will be familiar with the concept, but to explain for D3D11 people - the pointers returned by vkMapMemory() can
be held and even written to by the CPU while the GPU is using them. These ‘persistent‘ maps are perfectly valid as long as you obey the rules and make sure to synchronise access so that the CPU isn‘t writing to parts of the memory allocation that the GPU is
using (see later).

This is a little outside the scope of this guide but I‘m going to mention it any chance I get - for the purposes of debugging, persistent maps of non-coherent memory with explicit region flushes
will be much more efficient/fast than coherent memory. The reason being that for coherent memory the debugger must jump through hoops to detect and track changes, but the explicit flushes of non-coherent memory provide nice markup of modifications.

In RenderDoc to help out with this, if you flush a memory region then the tool assumes you will flush for every write, and turns off the expensive hoop-jumping to track coherent memory. That way even if the only
memory available is coherent, then you can get efficient debugging.

分配现存

1)创建好Buffer & image, 但是没有内存的,例如GL, texture,都需要给他们分配内存空间;通过vkGetPhysicalDeviceMemoryProperties()获取到内存的大小、类型、特性;对于PC来说,一个系统内存,一个是显存;对于mobile来说,只要系统内存。

2)内存类型有不同属性。一些内存可以被CPU访问或者不行、GPU和CPU访问一致、有缓存或者无缓存等等,可以通过查询设备,获取这些特性。

3) vkAllocateMemory()分配内存,输入参数有两个(VkDeivce 和 描述性结构《该结构描述要分配内存的大小、类型、heap属性等等》),并返回一个VkDeviceMemory。

4)vkMapMemory()/vkUnmapMemory(),类似与GL
glMap/UnMap*

         

Binding Memory

Each VkBuffer or VkImage,
depending on its properties like usage flags and tiling mode (remember that one?) will report their memory requirements to you viavkGetBufferMemoryRequirements or vkGetImageMemoryRequirements.

The reported size requirement will account for padding for alignment between mips, hidden meta-data, and anything else needed for the total allocation. The requirements also include a
bitmask of the memory types that are compatible with this particular resource. The obvious restrictions kick in here: that OPTIMAL tiling
color attachment image will report that only DEVICE_LOCAL memory types are compatible,
and it will be invalid to try to bind some HOST_VISIBLE memory.

The memory type requirements generally won‘t vary if you have the same kind of image or buffer. For example if you know that optimally tiled images can go in memory type 3, you can allocate
all of them from the same place. You will only have to check the size and alignment requirements per-image. Read the spec for the exact guarantee here!

Note the memory allocation is by no means 1:1. You can allocate a large amount of memory and as long as you obey the above restrictions you can place several images or buffers in it at different offsets. The requirements
include an alignment if you are placing the resource at a non-zero offset. In fact you will definitely want to do this in any real application, as there are limits on the total number of allocations allowed.

There is an additional alignment requirement bufferImageGranularity -
a minimum separation required between memory used for a VkImage and memory used for aVkBuffer in
the same VkDeviceMemory. Read the spec for more details, but this mostly boils down to
an effective page size, and requirement that each page is only used for one type of resource.

Once you have the right memory type and size and alignment, you can bind it with vkBindBufferMemory or vkBindImageMemory.
This binding is immutable, and must happen before you start using the buffer or image.

内存绑定

1)内存绑定,顾名思义,是将创建的Buffer/Image 与内存进行绑定。

2)通过vkGetBufferMemoryRequirements or vkGetImageMemoryRequirements,获取到Buffer/Image的特性和要求。

          3)通过Buffer/Image获取内存分配需求大小,其包含mips、隐藏的meta-data以及其它需求。

 
               4) 
Optimal的ColorAttach Image将只能使用DeviceLocal的内存,如果绑定HostVisible内存将是不无效的。

5) 内存分配不是1:1,可以分配多个,通过不同的offset,进行内存绑定。

简单看一段关于“Buffer 创建、内存分配、内存绑定”的例子:

        VkBufferCreateInfo bufferCreateInfo;
	bufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
	bufferCreateInfo.pNext = NULL;
	bufferCreateInfo.flags = 0;
	bufferCreateInfo.size = dataSize;
	bufferCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT;
	bufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
	bufferCreateInfo.queueFamilyIndexCount = 0;
	bufferCreateInfo.pQueueFamilyIndices = NULL;

	VkBuffer srcBuffer;
	vkCreateBuffer( context->device->device, &bufferCreateInfo, VK_ALLOCATOR, &srcBuffer );

	VkMemoryRequirements memoryRequirements;
	vkGetBufferMemoryRequirements( context->device->device, srcBuffer, &memoryRequirements );

	VkMemoryAllocateInfo memoryAllocateInfo;
	memoryAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
	memoryAllocateInfo.pNext = NULL;
	memoryAllocateInfo.allocationSize = memoryRequirements.size;
	memoryAllocateInfo.memoryTypeIndex = GpuDevice_GetMemoryTypeIndex( context->device, memoryRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT );

	VkDeviceMemory srcMemory;
	vkAllocateMemory( context->device->device, &memoryAllocateInfo, VK_ALLOCATOR, &srcMemory );
	vkBindBufferMemory( context->device->device, srcBuffer, srcMemory, 0 );

	void * mapped;
	vkMapMemory( context->device->device, srcMemory, 0, memoryRequirements.size, 0, &mapped );
	memcpy( mapped, data, dataSize );
	vkUnmapMemory( context->device->device, srcMemory );

Command buffers and submission

Work is explicitly recorded to and submitted from a VkCommandBuffer.

VkCommandBuffer isn‘t
created directly, it is allocated from a VkCommandPool. This allows for better threading
behaviour since command buffers and command pools must be externally synchronised (see later). You can have a pool per thread and vkAllocateCommandBuffers()/vkFreeCommandBuffers() command
buffers from it without heavy locking.

Once you have a VkCommandBuffer you
begin recording, issue all your GPU commands into it *hand waving goes here* and end recording.

Command buffers are submitted to a VkQueue.
The notion of queues are how work becomes serialised to be passed to the GPU. A VkPhysicalDevice (remember
way back? The GPU handle) can report a number of queue families with different capabilities. e.g. a graphics queue family and a compute-only queue family. When you create your device you ask for a certain number of queues from each family,
and then you can enumerate them from the device after creation with vkGetDeviceQueue().

I‘m going to focus on having just a single do-everything VkQueue as
the simple case, since multiple queues must be synchronised against each other as they can run out of order or in parallel to each other. Be aware that some implementations might require you to use a separate queue for swapchain presentation - I think chances
are that most won‘t, but you have to account for this. Again, read the spec for details!

You can vkQueueSubmit() several
command buffers at once to the queue and they will be executed in turn. Nominally this defines the order of execution but remember that Vulkan has very specific ordering guarantees - mostly about what work can overlap rather than wholesale rearrangement -
so take care to read the spec to make sure you synchronise everything correctly.

Command buffers

1)在GPU驱动程序中,有Command buffers 和 Queue,主要是管理渲染的资源,最终提交给GPU执行;Vulkan提出这种概念,类似与GPU驱动中的command buffers 和 Queue。了解Vulkan的历史的读者可以知道,Vulkan是想让应用程序员更多的控制和操作GPU,减少GPU驱动的操作,提高GPU的执行效率。另外,为了让驱动能更加简易的优化这些Command的调用,没有任何渲染状态会在Command
Buffer之间继承下来。每一个Command Buffer都需要显式的绑定它所需要的所有渲染状态,Shader,和DescriptorSet等等。这和传统API中,只要你不改某个状态,某个状态就一直不会变,这一点很不一样。

2)CommandBuffer 通过vkAllocateCommandBuffers()/vkFreeCommandBuffers()分配和释放,其分配需从vkCommandPool;填完CommandBuffer后,提交给Queue
(可以通过device,获取到相应对于Queue,Graphics/Compute queue),其最终交给GPU实现。

Shaders and Pipeline State Objects

The reasoning behind moving to monolithic PSOs is well trodden by now so I won‘t go over it.

A Vulkan VkPipeline bakes in a lot of state, but
allows specific parts of the fixed function pipeline to be set dynamically: Things like viewport, stencil masks and refs, blend constants, etc. A full list as ever is in the spec. When you call vkCreateGraphicsPipelines(),
you choose which states will be dynamic, and the others are taken from values specified in the PSO creation info.

You can optionally specify a VkPipelineCache at creation
time. This allows you to compile a whole bunch of pipelines and then call vkGetPipelineCacheData() to
save the blob of data to disk. Next time you can prepopulate the cache to save on PSO creation time. The expected caveats apply - there is versioning to be aware of so you can‘t load out of date or incorrect caches.

Shaders are specified as SPIR-V. This has already been discussed much better elsewhere, so I will just say that you create a VkShaderModule from
a SPIR-V module, which could contain several entry points, and at pipeline creation time you chose one particular entry point.

The easiest way to get some SPIR-V for testing is with the reference compiler glslang,
but other front-ends are available, as well as LLVM → SPIR-V support.

Binding Model

To establish a point of reference, let‘s roughly outline D3D11‘s binding model. GL‘s is quite similar.

  • Each shader stage has its own namespace, so pixel shader texture binding 0 is not vertex shader texture binding 0.
  • Each resource type is namespaced apart, so constant buffer binding 0 is definitely not the same as texture binding 0.
  • Resources are individually bound and unbound to slots (or at best in contiguous batches).

In Vulkan, the base binding unit is a descriptor. A descriptor is an opaque representation that stores ‘one bind‘. This could be an image, a sampler, a uniform/constant buffer, etc. It could also be
arrayed - so you can have an array of images that can be different sizes etc, as long as they are all 2D floating point images.

Descriptors aren‘t bound individually, they are bound in blocks in a VkDescriptorSet which
each have a particular VkDescriptorSetLayout. The VkDescriptorSetLayout describes
the types of the individual bindings in each VkDescriptorSet.

The easiest way I find to think about this is consider VkDescriptorSetLayout as
being like a C struct type - it describes some members, each member having an opaque type (constant buffer, load/store image, etc). The VkDescriptorSet is
a specific instance of that type - and each member in the VkDescriptorSet is a binding
you can update with whichever resource you want it to contain.

This is roughly how you create the objects too. You pass a list of the types, array sizes and bindings to Vulkan to create a VkDescriptorSetLayout,
then you can allocateVkDescriptorSets with that layout from a VkDescriptorPool.
The pool acts the same way as VkCommandPool, to let you allocate descriptors on different
threads more efficiently by having a pool per thread.

VkDescriptorSetLayoutBinding bindings[] = {
	// binding 0 is a UBO, array size 1, visible to all stages
	{ 0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, 1, VK_SHADER_STAGE_ALL_GRAPHICS, NULL },
	// binding 1 is a sampler, array size 1, visible to all stages
	{ 1, VK_DESCRIPTOR_TYPE_SAMPLER,        1, VK_SHADER_STAGE_ALL_GRAPHICS, NULL },
	// binding 5 is an image, array size 10, visible only to fragment shader
	{ 5, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, 10, VK_SHADER_STAGE_FRAGMENT_BIT, NULL },
};

Example C++ outlining creation of a descriptor set layout

Once you have a descriptor set, you can update it directly to put specific values in the bindings, and also copy between different descriptor sets.

When creating a pipeline, you specify N VkDescriptorSetLayouts for
use in a VkPipelineLayout. Then when binding, you have to bind matching VkDescriptorSets of
those layouts. The sets can update and be bound at different frequencies, which allows grouping all resources by frequency of update.

To extend the above analogy, this defines the pipeline as something like a function, and it can take some number of structs as arguments. When creating the pipeline you declare the types (VkDescriptorSetLayouts)
of each argument, and when binding the pipeline you pass specific instances of those types (VkDescriptorSets).

The other side of the equation is fairly simple - instead of having shader or type namespaced bindings in your shader code, each resource in the shader simply says which descriptor set and binding it pulls from. This matches
the descriptor set layout you created.

#version 430

layout(set = 0, binding = 0) uniform MyUniformBufferType {
// ...
} MyUniformBufferInstance;

// note in the C++ sample above, this is just a sampler - not a combined image+sampler
// as is typical in GL.
layout(set = 0, binding = 1) sampler MySampler;

layout(set = 0, binding = 5) uniform image2D MyImages[10];

Example GLSL showing bindings

Synchronisation

I‘m going to hand wave a lot in this section because the specific things you need to synchronise get complicated and long-winded fast, and I‘m just going to focus on what synchronisation is available and leave the details
of what you need to synchronise to reading of specs or more in-depth documents.

This is probably the hardest part of Vulkan to get right, especially since missing synchronisation might not necessarily break anything when you run it!

Several types of objects must be ‘externally synchronised‘. In fact I‘ve used that phrase before in this post. The meaning is basically that if you try to use the same VkQueue on
two different threads, there‘s no internal locking so it will crash - it‘s up to you to ‘externally synchronise‘ access to that VkQueue.

For the exact requirements of what objects must be externally synchronised when you should check the spec, but as a rule you can use VkDevice for
creation functions freely - it is locked for allocation sake - but things like recording and submitting commands must be synchronised.

N.B. There is no explicit or implicit ref counting of any object - you can‘t destroy anything until you are sure it is never going to be used again by either the CPU or the GPU.

Vulkan has VkEventVkSemaphore and VkFence which
can be used for efficient CPU-GPU and GPU-GPU synchronisation. They work as you expect so you can look up the precise use etc yourself, but there are no surprises here. Be careful that you do use synchronisation though, as there are few ordering guarantees
in the spec itself.

Pipeline barriers are a new concept, that are used in general terms for ensuring ordering of GPU-side operations where necessary, for example ensuring that results from one operation are complete before another operation
starts, or that all work of one type finishes on a resource before it‘s used for work of another type.

There are three types of barrier - VkMemoryBarrierVkBufferMemoryBarrier and VkImageMemoryBarrier.
VkMemoryBarrier applies to memory globally, and the other two apply to specific resources
(and subsections of those resources).

The barrier takes a bit field of different memory access types to specify what operations on each side of the barrier should be synchronised against the other. A simple example of this would be "this VkImageMemoryBarrier has srcAccessMask
= ACCESS_COLOR_ATTACHMENT_WRITE
 and dstAccessMask = ACCESS_SHADER_READ", which
indicates that all color writes should finish before any shader reads begin - without this barrier in place, you could read stale data.

Image layouts

Image barriers have one additional property - images exist in states called image layouts. VkImageMemoryBarrier can
specify a transition from one layout to another. The layout must match how the image is used at any time. There is a GENERAL layout
which is legal to use for anything but might not be optimal, and there are optimal layouts for color attachment, depth attachment, shader sampling, etc.

Images begin in either the UNDEFINED or PREINITIALIZED state
(you can choose). The latter is useful for populating an image with data before use, as the UNDEFINED layout
has undefined contents - a transition from UNDEFINED to GENERAL may
lose the contents, but PREINITIALIZED to GENERAL won‘t.
Neither initial layout is valid for use by the GPU, so at minimum after creation an image needs to be transitioned into some appropriate state.

Usually you have to specify the previous and new layouts accurately, but it is always valid to transition from UNDEFINED to
another layout. This basically means ‘I don‘t care what the image was like before, throw it away and use it like this‘.

Render passes

VkRenderpass is Vulkan‘s way of more explicitly
denoting how your rendering happens, rather than letting you render into then sample images at will. More information about how the frame is structured will aid everyone, but primarily this is to aid tile based renderers so that they have a direct notion of
where rendering on a given target happens and what dependencies there are between passes, to avoid leaving tile memory as much as possible.

N.B. Because I primarily work on desktops (and for brevity & simplicity) I‘m not mentioning a couple of optional things you can do that aren‘t commonly suited to desktop GPUs like input and transient attachments.
As always, read the spec :).

The first building block is a VkFramebuffer, which
is a set of VkImageViews. This is not necessarily
the same as the classic idea of a framebuffer as the particular images you are rendering to at any given point, as it can contain potentially more images than you ever render to at once.

VkRenderPass consists of a series of subpasses.
In your simple triangle case and possibly in many other cases, this will just be one subpass. For now, let‘s just consider that case. The subpass selects some of the framebuffer attachments as color attachments and maybe one as a depth-stencil attachment.
If you have multiple subpasses, this is where you might have different subsets used in each subpass - sometimes as output and sometimes as input.

Drawing commands can only happen inside a VkRenderPass,
and some commands such as copies clears can only happen outside a VkRenderPass.
Some commands such as state binding can happen inside or outside at will. Consult the spec to see which commands are which.

Subpasses do not inherit state at all, so each time you start a VkRenderPass or
move to a new subpass you have to bind/set all of the state. Subpasses also specify an action both for loading and storing each attachment. This allows you to say ‘the depth should be cleared to 1.0, but the color can be initialised to garbage for all I care
- I‘m going to fully overwrite the screen in this pass‘. Again, this can provide useful optimisation information that the driver no longer has to guess.

The last consideration is compatibility between these different objects. When you create a VkRenderPass (and
all of its subpasses) you don‘t reference anything else, but you do specify both the format and use of all attachments. Then when you create a VkFramebuffer you
must choose a VkRenderPass that it will be used with. This doesn‘t have to be the exact
instance that you will later use, but it does have to be compatible - the same number and format of attachments. Similarly when creating a VkPipeline you
have to specify theVkRenderPass and subpass that it will be used with, again not having
to be identical but required to be compatible.

There are more complexities to consider if you have multiple subpasses within your render pass, as you have to declare barriers and dependencies between them, and annotate which attachments must be used for what. Again,
if you‘re looking into that read the spec.

Backbuffers and presentation

I‘m only going to talk about this fairly briefly because not only is it platform-specific but it‘s fairly straightforward.

Note that Vulkan exposes native window system integration via extensions, so you will have to request them explicitly when you create your VkInstance and VkDevice.

To start with, you create a VkSurfaceKHR from whatever
native windowing information is needed.

Once you have a surface you can create a VkSwapchainKHR for
that surface. You‘ll need to query for things like what formats are supported on that surface, how many backbuffers you can have in the chain, etc.

You can then obtain the actual images in the VkSwapchainKHR via vkGetSwapchainImagesKHR().
These are normal VkImage handles, but you don‘t control their creation or memory binding
- that‘s all done for you. You will have to create an VkImageView each though.

When you want to render to one of the images in the swapchain, you can call vkAcquireNextImageKHR() that
will return to you the index of the next image in the chain. You can render to it and then call vkQueuePresentKHR() with
the same index to have it presented to the display.

There are many more subtleties and details if you want to get really optimal use out of the swapchain, but for the dead-simple hello world case, the above suffices.

Conclusion

Hopefully you‘re still with me after that rather break-neck pace.

As promised I‘ve skipped a lot of details and skimmed over some complexities, for example I have completely failed to mention sparse resources support, primary and secondary command buffers, and I‘ve probably missed some
other cool things.

With any luck though you have the broad-strokes impression of how a simple Vulkan applications is put together, and you‘re in a better place to go look at some documentation and figure the rest out for yourself.

Any questions or comments, let me know on twitter or email.
In particular if anything is actually wrong I will correct it, as I don‘t want to mislead with this document - just set up a basic understanding that can be expanded on with further reading.

Also just to plug myself a little, if you need a graphics debugger for Vulkan consider giving RenderDoc a
try, and let me know if you have any problems.

Appendix: Sample Pseudocode

#include <vulkan/vulkan.h>

// Pseudocode of what an application looks like. I‘ve omitted most creation structures,
// almost all synchronisation and all error checking. This is not a copy-paste guide!
void DoVulkanRendering()
{
  const char *extensionNames[] = { "VK_KHR_surface", "VK_KHR_win32_surface" };

  // future structs will not be detailed, but this one is for illustration.
  // Application info is optional (you can specify application/engine name and version)
  // Note we activate the WSI instance extensions, provided by the ICD to
  // allow us to create a surface (win32 is an example, there‘s also xcb/xlib/etc)
  VkInstanceCreateInfo instanceCreateInfo = {
    VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType;
    NULL,                                   // const void* pNext;

    0,                                      // VkInstanceCreateFlags flags;

    NULL,                                   // const VkApplicationInfo* pApplicationInfo;

    0,                                      // uint32_t enabledLayerNameCount;
    NULL,                                   // const char* const* ppEnabledLayerNames;

    2,                                      // uint32_t enabledExtensionNameCount;
    extensionNames,                         // const char* const* ppEnabledExtensionNames;
  };

  VkInstance inst;
  vkCreateInstance(&instanceCreateInfo, NULL, &inst);

  // The enumeration pattern SHOULD be to call with last parameter NULL to
  // get the count, then call again to get the handles. For brevity, omitted
  VkPhysicalDevice phys[4]; uint32_t physCount = 4;
  vkEnumeratePhysicalDevices(inst, &physCount, phys);

  VkDeviceCreateInfo deviceCreateInfo = {
    // I said I was going to start omitting things!
  };

  VkDevice dev;
  vkCreateDevice(phys[0], &deviceCreateInfo, NULL, &dev);

  // fetch vkCreateWin32SurfaceKHR extension function pointer via vkGetInstanceProcAddr
	VkWin32SurfaceCreateInfoKHR surfaceCreateInfo = {
		// HINSTANCE, HWND, etc
	};
  VkSurfaceKHR surf;
  vkCreateWin32SurfaceKHR(inst, &surfaceCreateInfo, NULL, &surf);

  VkSwapchainCreateInfoKHR swapCreateInfo = {
    // surf goes in here
  };
  VkSwapchainKHR swap;
  vkCreateSwapchainKHR(dev, &swapCreateInfo, NULL, &swap);

  // Again this should be properly enumerated
  VkImage images[4]; uint32_t swapCount;
  vkGetSwapchainImagesKHR(dev, swap, &swapCount, images);

  // Synchronisation is needed here!
  uint32_t currentSwapImage;
  vkAcquireNextImageKHR(dev, swap, UINT64_MAX, presentCompleteSemaphore, NULL, &currentSwapImage);

  // pass appropriate creation info to create view of image
  VkImageView backbufferView;
  vkCreateImageView(dev, &backbufferViewCreateInfo, NULL, &backbufferView);

  VkQueue queue;
  vkGetDeviceQueue(dev, 0, 0, &queue);

  VkRenderPassCreateInfo renderpassCreateInfo = {
    // here you will specify the total list of attachments
    // (which in this case is just one, that‘s e.g. R8G8B8A8_UNORM)
    // as well as describe a single subpass, using that attachment
    // for color and with no depth-stencil attachment
  };

  VkRenderPass renderpass;
  vkCreateRenderPass(dev, &renderpassCreateInfo, NULL, &renderpass);

  VkFramebufferCreateInfo framebufferCreateInfo = {
    // include backbufferView here to render to, and renderpass to be
    // compatible with.
  };

  VkFramebuffer framebuffer;
  vkCreateFramebuffer(dev, &framebufferCreateInfo, NULL, &framebuffer);

  VkDescriptorSetLayoutCreateInfo descSetLayoutCreateInfo = {
    // whatever we want to match our shader. e.g. Binding 0 = UBO for a simple
    // case with just a vertex shader UBO with transform data.
  };

  VkDescriptorSetLayout descSetLayout;
  vkCreateDescriptorSetLayout(dev, &descSetLayoutCreateInfo, NULL, &descSetLayout);

  VkPipelineCreateInfo pipeLayoutCreateInfo = {
    // one descriptor set, with layout descSetLayout
  };

  VkPipelineLayout pipeLayout;
  vkCreatePipelineLayout(dev, &pipeLayoutCreateInfo, NULL, &pipeLayout);

  // upload the SPIR-V shaders
  VkShaderModule vertModule, fragModule;
  vkCreateShaderModule(dev, &vertModuleInfoWithSPIRV, NULL, &vertModule);
  vkCreateShaderModule(dev, &fragModuleInfoWithSPIRV, NULL, &fragModule);

  VkGraphicsPipelineCreateInfo pipeCreateInfo = {
    // there are a LOT of sub-structures under here to fully specify
    // the PSO state. It will reference vertModule, fragModule and pipeLayout
    // as well as renderpass for compatibility
  };

  VkPipeline pipeline;
  vkCreateGraphicsPipelines(dev, NULL, 1, &pipeCreateInfo, NULL, &pipeline);

  VkDescriptorPoolCreateInfo descPoolCreateInfo = {
    // the creation info states how many descriptor sets are in this pool
  };

  VkDescriptorPool descPool;
  vkCreateDescriptorPool(dev, &descPoolCreateInfo, NULL, &descPool);

  VkDescriptorSetAllocateInfo descAllocInfo = {
    // from pool descPool, with layout descSetLayout
  };

  VkDescriptorSet descSet;
  vkAllocateDescriptorSets(dev, &descAllocInfo, &descSet);

  VkBufferCreateInfo bufferCreateInfo = {
    // buffer for uniform usage, of appropriate size
  };

  VkMemoryAllocateInfo memAllocInfo = {
    // skipping querying for memory requirements. Let‘s assume the buffer
    // can be placed in host visible memory.
  };
  VkBuffer buffer;
  VkDeviceMemory memory;
  vkCreateBuffer(dev, &bufferCreateInfo, NULL, &buffer);
  vkAllocateMemory(dev, &memAllocInfo, NULL, &memory);
  vkBindBufferMemory(dev, buffer, memory, 0);

  void *data = NULL;
  vkMapMemory(dev, memory, 0, VK_WHOLE_SIZE, 0, &data);
  // fill data pointer with lovely transform goodness
  vkUnmapMemory(dev, memory);

  VkWriteDescriptorSet descriptorWrite = {
    // write the details of our UBO buffer into binding 0
  };

  vkUpdateDescriptorSets(dev, 1, &descriptorWrite, 0, NULL);

  // finally we can render something!
  // ...
  // Almost.

  VkCommandPoolCreateInfo commandPoolCreateInfo = {
    // nothing interesting
  };

  VkCommandPool commandPool;
  vkCreateCommandPool(dev, &commandPoolCreateInfo, NULL, &commandPool);

  VkCommandBufferAllocateInfo commandAllocInfo = {
    // allocate from commandPool
  };
  VkCommandBuffer cmd;
  vkAllocateCommandBuffers(dev, &commandAllocInfo, &cmd);

  // Now we can render!

  vkBeginCommandBuffer(cmd, &cmdBeginInfo);
  vkCmdBeginRenderPass(cmd, &renderpassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
  // bind the pipeline
  vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
  // bind the descriptor set
  vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS,
                          descSetLayout, 1, &descSet, 0, NULL);
  // set the viewport
  vkCmdSetViewport(cmd, 1, &viewport);
  // draw the triangle
  vkCmdDraw(cmd, 3, 1, 0, 0);
  vkCmdEndRenderPass(cmd);
  vkEndCommandBuffer(cmd);

  VkSubmitInfo submitInfo = {
    // this contains a reference to the above cmd to submit
  };

  vkQueueSubmit(queue, 1, &submitInfo, NULL);

  // now we can present
  VkPresentInfoKHR presentInfo = {
    // swap and currentSwapImage are used here
  };
  vkQueuePresentKHR(queue, &presentInfo);

  // Wait for everything to be done, and destroy objects
}

https://renderdoc.org/vulkan-in-30-minutes.html

时间: 2024-11-08 10:01:09

分析和通译“Vulkan in 30 minutes”文的相关文章

Your idea evaluation has expired. Your session will be limited to 30 minutes

今天打开idea,出现了上面的话,试了网上的很多办法,获取注册码的那个方法是最常见的,那个网站现在不提供注册码了. ----两种方法-----**1)把提示框的x点掉,会自动打开idea**按最开始安装idea时的步骤1.----"Help" -> “Register”,选择License server方式,地址填入:http://jetbrains-license-server (应该会自动填上)2.点击激活,在红色框框的地方会出现>Please use the new

Vulkan SDK Demo 之一 熟悉

DiligentEngine的API是D3d11和D3D12风格的,vulkan也被封装成了这种风格的API. 在了解Diligent Engine是如何对vulkan进行封装之前,我准备先学习下Vulkan.知乎funchun的编程指南是中文版,英文不好,准备先看一版中文版,回头再去研习其他的内容. 1.Vulkan编程指南阅读摘要 1.1 Vulkan SDK 安装完成之后,安装目录有如下文件.在阅读编程指南的过程中,我并未采用指南中的demo code来进行研究,而是使用的SDK自带的de

Vulkan 学习资料汇总

1.知乎Vulkan-高性能渲染 2.Life of a triangle - NVIDIA's logical pipeline 3.Round Robin 算法 4.NVIDIA Developer Vulkan 5.Vulkan SDK Tutorial 6.Vulkan In 30 Minutes 7.Vulkan Notes 8.GDC  2016 Talk 9.知乎: Vulkan编程指南 10.Shader交叉编译之梦 11.游戏引擎随笔: 现代图形API 12.SPIR-V 13

随机生成30道四则运算程序分析

题目要求:随机生成30到四则运算题,其中不能超过2位运算,包含真分数. 问题细化: 1.循环30次 2.四则运算方式的表示,+,-,*,/ 3.真分数的产生 4.随机产生真分数之间或者整数之间以及真分数和整数之间的运算 细化分析: 1.循环30次 2.在循环中随机产生一个1-4的随机数,表示整数运算,真分数和整数之间的运算,还是真分数之间的运算 3.在确定后,在随机产生一个1-4的随机数表示使用哪种规则,+,-,*,/ 4.真分数表示使用两个随机数,要求分子比分母小,然后用“分子/分母”表示 以

qml demo分析(clocks-时钟)

一.效果展示 效果如图1所示,时钟列表支持鼠标左右拖动,带有黑色背景的是晚上时钟,无黑色背景的是白天时钟 二.源码分析 1.main.cpp文件中只包含了一个宏,该宏的具体解释请看qml 示例中的关键宏文章 2.时钟项 1 Item { 2 id : clock 3 width: { 4 if (ListView.view && ListView.view.width >= 200) 5 return ListView.view.width / Math.floor(ListView

Android4.4 Telephony流程分析——GsmServiceStateTracker管理网络服务状态

本文代码以MTK平台Android 4.4为分析对象,与Google原生AOSP有些许差异,请读者知悉. 本文主要介绍GsmServiceStateTracker是怎么管理网络的?手机开机后,怎么去注册网络?网络状态是怎么变换传递的. Android在ServiceState.java中定义了四种ServiceState状态和16中无线通信网络类型: public class ServiceState implements Parcelable { /** * Normal operation

产品经理做竞品分析的思路(一)

本文转自:简书 文/休言万事转头空,觉得分析竞品的思路挺好,所以就转了~ 1.确定目的 对于大部分事情来说,目的决定一切. 竞品分析自然也不例外,一切不以“目的”为目的的竞品分析都是耍流氓!我们在做竞品分析之前必须先把自己的目的搞清楚,否则很容易盲目,并且把事情想得过于复杂,做分析的时候也就会变得很吃力. 那么,这个目的是什么呢?这个目的就是我们做竞品分析希望得到的东西,这就是我们的目的,不同的目的决定了我们做竞品分析的思路是不一样的. 从普遍性的角度来说,竞品分析的目的就是用来指导我们的产品的

LR(1)文法智能分析

LR1文法全智能分析 // by hfut yzk #include "stdafx.h" #include<fstream> #include<string> #include<map> #include<vector> #include<stack> #include<set> #include<cstring> #include<queue> using namespace std;

Android兼容性测试GTS-环境搭建、测试执行、结果分析

GTS的全称是Google Mobile Services Test Suite,所谓的Google Mobile Services即谷歌移动服务,是谷歌开发并推动Android的动力,也是Android系统的灵魂所在.GMS目前提供有Search.Search by Voice.Gmail.Contact Sync.Calendar Sync.Talk.Maps.Street View.YouTube.Android Market (ICS 上更改为Play store)服务.当用户使用谷歌的