【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程

  前面几个blog对DRM驱动、显卡的显存管理机制、中断机制都进行了一些描述,现在阅读AMD drm驱动的初始化过程应该会轻松许多。

  下面是一AMD的开发人员编写的文章(先暂时放在这里,后续有时间再添加自己的看法)。

Understanding GPUs from the ground up

I get asked a lot about learning how to program GPUs.  Bringing up evergreen kms support seems like a good place to start, so I figured I write a series of articles detailing the process based on the actual evergreen patches.  First, to get a better understanding of how GPUs work, take a look at the radeon drm.  This article assumes a basic understanding of C and computer architectures.  The basic process is that the driver loads, initializes the hardware, sets up non-hw specific things like the memory manager, and sets up the displays.  This first article describes the basic driver flow when the drm loads in kms mode.

radeon_driver_load_kms() (in radeon_kms.c) is where everything starts.  It calls radeon_device_init() to initialize the non-display hardware and radeon_modeset_init() (in radeon_display.c) to initialize the display hardware.

The main workhorse of the driver initialization is radeon_device_init() found in radeon_device.c.  First we initialize a bunch of the structs used in the driver.  Then radeon_asic_init() is called. This function sets up the asic specific function pointers for various things such as suspend/resume callbacks, asic reset, set/process irqs, set/get engine clocks, etc.  The common code then uses these callbacks to call the asic specific code to achieve the requested functionality.  For example, enabling and processing interrupts works differently on a RV100 vs. a RV770.  Since functionality changes in stages, some routines are used for multiple asic families.  This lets us mix and match the appropriate functions for the specifics of how the chip is programmed.  For example, both R1xx and R3xx chips both use the same interrupt scheme (as defined in r100_irq_set()/r100_irq_process()), but they have different initialization routines (r100_init() vs. r300_init()).

Next we set up the DMA masks for the driver.  These let the kernel know what size address space the the card is able to address.  In the case of radeons, it’s used for GPU access to graphics buffers stored in system memory which are accessed via a GART (Graphics Address Remapping Table).  AGP and the older on-chip GART mechanisms are limited to 32 bits.  Newer on-chip GART mechanisms have larger address spaces.

After DMA masks, we set up the MMIO aperture.  PCI/PCIE/AGP devices are programmed via apertures called BARs (Base Address Register).  There apertures provide access to resources on the card such as registers, framebuffers, and roms.  GPUs are configured via registers, if you want to access those registers, you’d map the register BAR.  If you want to write to the framebuffer (some of which may be displayed on your screen), you would map the framebuffer BAR.  In this case we map the register BAR; this register mapping is then used by the driver to configure the card.

vga_client_register() comes next, and is beyond the scope of this article.  It’s basically a way to work around the limitations of VGA on PCI buses with multiple VGA devices.

Next up is radeon_init().  This is actually a macro defined in radeon.h that references the asic init callback we initialized in  radeon_asic_init() several steps ago.  The asic specific init function is called.  For an RV100, it would be r100_init() defined in r100.c, for RV770, it’s rv770_init().

That’s pretty much it for  radeon_device_init().  Next let’s look at what happens in the asic specific init functions.  They all follow the same pattern, although some asics may do more or less depending on the functionality.  Let’s take a look at r100_init() in r100.c.  First we initialize debugfs; this is a kernel debugging framework and outside the scope of this article.  Next we call r100_vga_render_disable() this disables the VGA engine on the card.  The VGA engine provides VGA compatibility; since we are going to be programming the card directly, we disable it.

Following that, we set up the GPU scratch registers (radeon_scratch_init() defined in radeon_device.c).  These are scratch registers used by the CP (Command Processor) to to signal graphics events.  In general they are used for what we call fences.  A write to one of these scratch registers can be added to the command stream sent to the GPU.  When it encounters that command, it writes the value specified to that scratch register.  The driver can then check the value of the scratch register to determine whether that fence has come up or not.  For example, if you want to know if the GPU is done rendering to a buffer, you’d insert a fence after the rendering commands.  You can then check the scratch register to determine if that fence has passed (and hence the rendering is done).

radeon_get_bios() loads the video bios from the PCI ROM BAR.  The video bios contains data and command tables.  The data tables define things like the number and type of connectors on the card and how those connectors are mapped to encoders, the GPIO registers and bitfields used for DDC and other i2c buses, LVDS panel information for laptops, display and engine PLL limits, etc.  The command tables are used for initializing the hardware (normally done by the system bios during post, but required for things like suspend/resume and initializing secondary cards), and on systems with ATOM bios the command tables are used for setting up the displays and changing things like engine and memory clocks.

Next, we initialize the bios scratch registers (radeon_combios_initialize_bios_scratch_regs() via radeon_combios_init()).  These registers are a way for the firmware on the system to communicate state to the graphics driver.  They contain things like connected outputs, whether the driver or the firmware will handle things like lid or mode change events, etc.

radeon_boot_test_post_card() checks to see whether the system bios has posted the card or not.  This is used to determine whether the card needs to be initialized by the driver using the bios command tables or if the system bios as already done it.

radeon_get_clock_info() gets the PLL (Phase Locked Loop, used to generate clocks) information from the bios tables.  This includes the display PLLs, engine and memory PLLs and the reference clock that the PLLs use to generate their final clocks.

radeon_pm_init() initializes the power management features of the chip.

Next the MC (Memory Controller) is initialized (r100_mc_init()).  The GPU has it’s own address space similar to the CPU.  Within that address space you map VRAM and GART.  The blocks on the chip (2D, 3D engines, display controllers, etc.) access these resources via the GPU’s address space.  VRAM is mapped at one offset and GART at another.  If you want to read from a texture located in GART memory, you’d point the texture base address at some offset in the GART aperture in the GPU’s address space.  If you want to display a buffer in VRAM on your monitor, you’d point one of your crtc base addresses to an address in the VRAM aperture in the GPU’s address space.  The MC init function determines how much VRAM is on the card where to place VRAM and GART in the GPU’s address space.

radeon_fence_driver_init() initializes the common code used for fences.  See above for more on fences.

radeon_irq_kms_init() initializes the common code used for irqs.

radeon_bo_init() initializes the memory manager.

r100_pci_gart_init() sets up the on board GART mechanism and radeon_agp_init() initializes AGP GART.  This allows the GPU to access buffers in system memory.  Since system memory is paged, large allocations are not contiguous.  The GART provides a way to make many disparate pages look like one contiguous block by using address remapping.  With AGP, the northbridge provides the the address remapping, and you just point the GPU’s AGP aperture at the one provided by the northbridge.  The on-board GART provides the same functionality for non-AGP systems (PCI or PCIE).

Next up we have  r100_set_safe_registers().  This function sets the list of registers that command buffers from userspace are allowed to access.  When a userspace driver like the ddx (2D) or mesa (3D) sends commands to the GPU, the drm checks those command buffers to prevent access to unauthorized registers or memory.

Finally, r100_startup() programs the hardware with everything set up in r100_init().  It’s a separate function since it’s also called when resuming from suspend as the current hardware configuration needs to be restored in that case as well.  The VRAM and GART setup is programmed in r100_mc_program() and r100_pci_gart_enable(); irqs are setup in r100_irq_set().

r100_cp_init() initializes the CP and sets up the ring buffer.  The CP is the part of the chip that feeds acceleration commands to the GPU.  It’s fed by a ring buffer that the driver (CPU) writes to and the GPU reads from.  Besides commands, you can also write pointers to command buffers stored elsewhere in the GPU’s address space (called an indirect buffer).  For example, the 3D driver might send a command buffer to the drm; after checking it, the drm would put a pointer to that command buffer on the ring, followed by a fence.  When the CP gets to the pointer in the ring, it fetches the command buffer and processes the commands in it, then returns to where it left off in the ring.  Buffers referenced by the command buffer are “locked”until the fence passes since the GPU is accessing them in the execution of those commands.

r100_wb_init() initializes scratch register writeback which is a feature that lets the GPU update copies of the scratch registers in GART memory.  This allows the driver (running on the CPU) to access the content of those registers without having to read them from the MMIO register aperture which requires a trip across the bus.

r100_ib_init initializes the indirect buffers used for feeding command buffers to the CP from userspace drivers like the 3D driver.

The display side is set up in  radeon_modeset_init().  First we set up the display limits and mode callbacks, then we set up the output properties (radeon_modeset_create_props()) that are exposed via xrandr properties when X is running.

Next, we initialize the crtcs in radeon_crtc_init().  crtcs (also called display controllers) are the blocks on the chip that provide the display timing and determine where in the framebuffer a particular monitor points to.  A crtc provides an independent “head.”  Most radeon asics have two crtcs; the new evergreen chips have six.

radeon_setup_enc_conn() sets up the connector and encoder mappings based on video bios data tables.  Encoders are things like DACs for analog outputs like VGA and TV, and TMDS or LVDS encoders for things like digital DVI or LVDS panels.  An encoder can be tied to one or more connectors (e.g., the TV DAC is often tied to both the S-video and a VGA port or the analog portion of a DVI-I port).  The mapping is important as you need to know what encoders are in use and what they are tied to in order to program the displays properly.

radeon_hpd_init() is a macro that points to the asic specific function to initializes the HPD (Hot Plug Detect) hardware for digital monitors. HPD allows you to get an interrupt when a digital monitor is connected or disconnected.  When this happens the driver will take appropriate action and generate an event which userspace apps can listen for.  The app can then display a message asking the user what they want to do, etc.

Finally,  radeon_fbdev_init() sets up the drm kernel fb interface.  This provides a kernel fb interface on top of the drm for the console or other kernel fb apps.

When the driver is unloaded the whole process happens in reverse; this time all the *_fini() functions are called to tear down the driver.

The next set of articles will walk through the evergreen patches available here which have already been applied upstream and explain what each patch does to bring up support for evergreen chips.

时间: 2024-08-01 08:30:47

【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程的相关文章

Linux环境下的图形系统和AMD R600显卡编程(1)——Linux环境下的图形系统简介

转:https://www.cnblogs.com/shoemaker/p/linux_graphics01.html Linux/Unix环境下最早的图形系统是Xorg图形系统,Xorg图形系统通过扩展的方式以适应显卡和桌面图形发展的需要,然而随着软硬件的发展,特别是嵌入式系统的发展,Xorg显得庞大而落后.开源社区开发开发了一些新的图形系统,比如Wayland图形系统. 由于图形系统.3D图形本身的复杂以及历史原因,Linux下的图形系统相关的源码庞大而且复杂,而且缺少学习的资料(所有源代码

【原创】Linux环境下的图形系统和AMD R600显卡编程(5)——AMD显卡显命令处理机制

通常通过读写设备寄存器对设备进行编程,在X86系统上,有专门的IO指令进行编程,在其他诸如MIPS.SPARC这类系统上,通过将设备的寄存器映射到内存地址空间直接使用读写内存的方式对设备进行编程. Radeon显卡提供两种方式对硬件进行编程,一种称为“推模式”(push mode)即直接写寄存器的方式,另一种称为拉模式,这篇blog讨论拉模式,这也是驱动中使用的模式. 在拉模式下,驱动使用命令流(Command Stream)的形式进行对显卡编程:驱动程序将需要对显卡进行配置的一连串命令写入命令

【原创】Linux环境下的图形系统和AMD R600显卡编程(9)——R600显卡的3D引擎和图形流水线

1. R600 3D引擎 R600核心是AMD一款非常重要的GPU核心,这个核心引入了统一处理器架构,其寄存器和指令集同以前的GPU 都完全不同,对其编程也有比较大的区别. 图1显示了R600 GPU 核心的硬件逻辑图,R600 GPU 包含并行数据处理阵列(DPP array).命令处理器.内存控制器以及其他逻辑部件,R600的命令处理器读取驱动编写命令并解析命令,R600还要将硬件产生的“软中断”发送给CPU.R600的内存控制器能够访问R600 GPU核上的所有内存(VRAM内存,或者称本

【原创】Linux环境下的图形系统和AMD R600显卡编程(10)——R600显卡的3D引擎编程

3D图形处理流水线需要流经多个硬件单元才能得到最后的渲染结果,流水线上的所有的硬件单元必须被正确编程,才能得到正确的结果. 总体上看,从图形处理流水线的源头开始,需要准备好vertex和index,在立即模式下,index可以直接编程在命令中,通过配置寄存器告诉GPU vertex buffer的位置,在启动GPU流水线之前,还需要将vertex shader程序和pixel shader程序加载到vram 中,并通过配置寄存器告示GPU shader程序的位置,在vertex shader和p

【原创】Linux环境下的图形系统和AMD R600显卡编程(3)——AMD显卡简介

早期的显卡仅用于显示,后来显卡中加入了2D加速部件,这些部件用于做拷屏,画点,画线等操作.随着游戏.三维模拟以及科学计算可视化等需要,对3D的需求逐渐增加,早期图形绘制工作由CPU来完成,要达到真实感和实时效果,只能绘制一些简单的线框模型,上世纪80年代,斯坦福大学的Jim Clark教授率先提出用专用集成电路技术实现一个专用的3D图形处理器的设想,于1984年推出了世界上第一个通用图形工作站IRIS1400. AMD最早的显卡从R100开始,一直到R900(R600以后也使用HD xxxx作为

【原创】Linux环境下的图形系统和AMD R600显卡编程(7)——AMD显卡的软件中断

CPU上处理的中断可以分成“硬件中断”和“软件中断”两类,比如网卡产生的中断称为硬件中断,而如果是软件使用诸如"int 0x10"(X86平台上)这样的指令产生中断称为软件中断,硬件中断是异步的,其发生的时机是不可知的,但是软件中断是同步的,CPU是“确切”知道其发生的时机的. 同样的,在GPU开来,中断也可以分成“硬件中断”和“软件中断”两类,比如热插拔事件或者vblank事件都会产生“硬件中断”,这些事件在GPU看来是异步的,GPU不知道这些事情何时发生.GPU也可以使用类似CPU

【原创】Linux环境下的图形系统和AMD R600显卡编程(6)——AMD显卡GPU命令格式

前面一篇blog里面描述了命令环缓冲区机制,在命令环机制下,驱动写入PM4(不知道为何会取这样一个名字)包格式的命令对显卡进行配置.这一篇blog将详细介绍命令包的格式. 当前定义了4中命令包,分别是0型/1型/2型和3型命令包,命令包由两部分组成,第一部分是命令包头,第二部分是命令包主体,命令包头为请求GPU执行的具体操作,命令主体为执行该操作需要的数据. 0型命令包 0型命令包用于写连续N个寄存器.包主体部分是依次往这些寄存器写的值.包头各个部分的意义为: 位 域名称 描述 12:0 BAS

【原创】Linux环境下的图形系统和AMD R600显卡编程(4)——AMD显卡显存管理机制

显卡使用的内存分为两部分,一部分是显卡自带的显存称为VRAM内存,另外一部分是系统主存称为GTT内存(graphics translation table和后面的GART含义相同,都是指显卡的页表,GTT 内存可以就理解为需要建立GPU页表的显存).在嵌入式系统或者集成显卡上,显卡通常是不自带显存的,而是完全使用系统内存.通常显卡上的显存访存速度数倍于系统内存,因而许多数据如果是放在显卡自带显存上,其速度将明显高于使用系统内存的情况(比如纹理,OpenGL中分普通纹理和常驻纹理). 某些内容是必

【原创】Linux环境下的图形系统和AMD R600显卡编程(11)——R600指令集

1 低级着色语言tgsi OpenGL程序使用GLSL语言对可编程图形处理器进行编程,GLSL语言(以下高级着色语言就是指GLSL)是语法类似C的高级语言,在GLSL规范中,GLSL语言被先翻译成教低级的类汇编语言,然后被翻译成硬件特定的指令集.OpenGL体系管理委员会于2002年6月和2002年9月分别通过了两个官方扩展:ARB_VERTEX_PROGRAM与ARB_FRAGMENT_PROGRAM来统一对低级着色语言的支持,GLSL语言被编译成针对这两个扩展的低级着色语言(因此这两个扩展可