CPU 内存 物理地址空间会散列 ( interleave between channel/memory controller);
NVME 设备对使用内存物理地址空间的限制:
The NVMe 1.0 specification requires all physical memory to be describable by what is called a PRP list. To be described by a PRP list, memory must have the following properties:
NVME 设备通过DMA传输数据
NVMe devices transfer data to and from system memory using Direct Memory Access (DMA). Specifically, they send messages across the PCI bus requesting data transfers. In the absence of an IOMMU, these messages contain physical memory addresses. These data transfers happen without involving the CPU, and the MMU is responsible for making access to memory coherent.
The memory is broken into physical 4KiB pages, which we‘ll call device pages.
The first device page can be a partial page starting at any 4-byte aligned address. It may extend up to the end of the current physical page, but not beyond.
If there is more than one device page, the first device page must end on a physical 4KiB page boundary.
The last device page begins on a physical 4KiB page boundary, but is not required to end on a physical 4KiB page boundary.
The specification allows for device pages to be other sizes than 4KiB, but all known devices as of this writing use 4KiB.
用户态的程序(SPDK)使用户态的地址,而nvme 设备需要使用物理地址,因此需要实现这两个地址之间的转换(映射)。
可以考虑的方法:
- inspect /proc/self 看虚拟地址和物理地址的映射关系。
但是由于page swap out/swap in可能导致映射关系改变,没法保证 nvme DAM 传输过程中pinned page的要求; - mlock call
mlock 强制内存的一个虚拟page 一直被一个物理页backed。这会导致swapping 被禁止。但这也无法保证那是static mapping,因为POSIX并没有定义一个支持pining memory 的API,分配pinned memroy的机制是和 OS 相关的。 - huge page
虽然这并非内核刻意的设计,但内核对huge page的处理不同于传统4KB page的处理,kernel 从不回改变它对应的物理内存的位置。
在没有IOMMU的情况下,上面通过huge page 申请到的虚拟地址还是需要通过转换成物理地址。
MMU:mem virtual address <----> physical address of memory
IOMMU: pci bus address of nvme device buffer/cache <----> buffer virtual address
数据交换: mem virtual address <=====> buffer virtual address
原文地址:http://blog.51cto.com/xiamachao/2349968