简析PPC的Device Tree机制 / 憋错料

年底完成了公司设备从arm到ppc的移植，有很多心得需要总结，趁年后不是很忙，整理写下来。

自己也是第一次接触ppc架构的kernel（版本号：3.4.55），很多东西学习不够深入，只写个思路框架，不去深究细节，错误地方还望大家指正。

今天首先来总结下PPC的Device Tree设备树机制，之前在移植arm的uboot以及kernel时，uboot和kernel之前的传参机制在arm架构下是可以选择的，使用tags方式还是fdt方式（flattened device tree）。我选择使用tags，之前有总结过tags的传参方式，可以参考我的另一篇文章，链接如下：

http://blog.csdn.net/skyflying2012/article/details/35787971

但是阅读了PPC架构的kernel启动代码后，发现PPC架构kernel启动传参仅支持fdt方式，趁这个机会学习下fdt机制。

1 为什么要用FDT，FDT优点是什么。

从网上找到的官方解释如下：

IBM、Sun等厂家的服务器最初都采用了Firmware（一种嵌入到硬件设备中的程序，用于提供软件和硬件之间的接口），用于初始化系统配置，提供操作系统软件和硬件之间的接口，启动和运行系统。后来为了标准化和兼容性，IBM、Sun等联合推出了固件接口IEEE 1275标准，让他们的服务器如IBM PowerPCpSeries，Apple PowerPC，Sun SPARC等均采用Open Firmware，在运行时构建系统硬件的设备树信息传递给内核，进行系统的启动运行。这样做的好处有，减少内核对系统硬件的严重依赖，利于加速支持包的开发，降低硬件带来的变化需求和成本，降低对内核设计和编译的要求。

在嵌入式PowerPC中，一般使用U-Boot之类的系统引导代码，而不采用Open Firmware。早期的U-Boot使用include/asm-ppc/u-boot.h中的静态数据结构struct bd_t将板子基本信息传递给内核，其余的由内核处理。这样的接口不够灵活，硬件发生变化就需要重新定制编译烧写引导代码和内核，而且也不再适应于现在的内核。为了适应内核的发展及嵌入式PowerPC平台的千变万化，吸收标准OpenFirmware的优点，UBoot引入了扁平设备树FDT这样的动态接口，使用一个单独的FDT blob（二进制大对象，是一个可以存储二进制文件的容器）存储传递给内核的参数，一些确定信息，例如cache大小、中断路由等直接由设备树提供，而其他的信息，例如eTSEC的MAC地址、频率、PCI总线数目等由U-Boot在运行时修改。

我的理解是为了适应灵活的嵌入式平台，FDT将一些固定人为需要修改的参数信息从uboot和kernel中（如uboot下的bd_t）剥离出来，修改硬件后，不需要重新修改烧录uboot kernel，仅需要修改FDT文件即可完成对新硬件的支持。但是有一些动态修改的信息还是需要uboot以及kernel来操作，如cmdline，usb以及pci的枚举设备信息。

对比而言，arm下使用的tags方式就是需要对uboot中的tags（如mem大小等）进行修改，完成对新硬件的支持。

2 FDT怎么用，格式是什么。

FDT设备树我们可以看做是描述设备硬件配置的线性树形数据结构，开发人员需要根据设备硬件配置来编写设备树，设备树的编写提供一套完全可视化的文本形式dts（device tree source），然后利用dtc（device tree compiler）编译成kernel需要的设备数镜像文件dtb，d t c 编译器会对输入文件进行语法和语义检查，并根据L i n u x 内核的要求检查各节点及属性，将设备树源码文件（. d t s ）编译二进制文件（. d t b ），以保证内核能正常启动，一个简单的例子如下：

/ {
    #address-cells = <1>;
    #size-cells = <1>;
    model = "test";
    compatible = "test";
    dcr-parent = <&{/cpus/[email protected]0}>;

    cpus {
        #address-cells = <1>;
        #size-cells = <0>;

        [email protected]0 {
            device_type = "cpu";
            model = "PowerPC,460EX";
            reg = <0x00000000>;
            i-cache-line-size = <32>;
            d-cache-line-size = <32>;
            i-cache-size = <32768>;
            d-cache-size = <32768>;
            dcr-controller;
            dcr-access-method = "native";
        };
    };

    memory {
        device_type = "memory";
        reg = <0x80000000 0x40000000>;
    };

    chosen {
        name = "chosen";
        bootargs = "console=ttyS0,115200 mem=512M rdinit=/sbin/init";
    };
};

这是我移植kernel时根据kernel下提供的dts文件修改的，kernel下已经有很多设备的dts文件，在arch/powerpc/boot/dts下，并且也集成了dtc编译器，我上面的dts文件是arch/powerpc/boot/dts/test.dts,则我可以在kernel下运行如下命令：

make test.dtb

就可以生成对应的dtb镜像。

对于开发人员来说，直接面对的是dts文件，下来就来说下dts文件的格式：

（dts格式网上有很多详细解释，并且在kernel下也有详细说明的文档，是Documentation/devicetree/booting-without-of.txt）

1 根节点

设备树的起始点称之为根节点” / ” 。属性m o d e l 指明了目标板平台或模块的名称，属性c o m p a t i b l e 值指明和目标板为同一系列的兼容的开发板名称。对于大多数3 2 位平台，属性# a d d r e s s - c e l l s 和# s i z e - c e l l s 的值一般为1 ，address-cells和size-cells分别定义了子节点地址和长度的宽度。

2 CPU节点

/ c p u s 节点是根节点的子节点，对于系统中的每一个C P U ，都有相应的节点。/ c p u s 节点没有必须指明的属性，但指明# a d d r e s s - c e l l s = < 1 > 和 # s i z e - c e l l s = < 0 > 是个好习惯，这同时指明了每个C P U 节点的r e g 属性格式，方便为物理C P U 编号。C P U 节点的单元名应该是c p u @ 0 的格式，此节点一般要指定d e v i c e _ t y p e （固定为” c p u ” ），一级数据/ 指令缓存的表项大小，一级数据/ 指令缓存的大小，核心、总线时钟频率等。在上面的示例中通过系统引导代码动态填写时钟频率相关项。

3 系统内存节点

此节点用于描述目标板上物理内存范围，一般称作/ m e m o r y 节点，可以有一个或多个。当有多个节点时，需要后跟单元地址予以区分；只有一个单元地址时，可以不写单元地址，默认为0 。

此节点包含板上物理内存的属性，一般要指定d e v i c e _ t y p e （固定为” m e m o r y ” ）和r e g 属性。其中r e g 的属性值以< 起始地址空间大小> 的形式给出，如上示例中目标板内存起始地址为0x80000000 ，大小为1G字节。

4 /chosen节点

这个节点有一点特殊。通常，这里由O p e n F i r m w a r e 存放可变的环境信息，例如参数，默认输入输出设备。

这个节点中一般指定b o o t a r g s 及l i n u x , s t d o u t - p a t h 属性值。b o o t a r g s 属性设置为传递给内核命令行的参数字符串。l i n u x , s t d o u t - p a t h 常常为标准终端设备的节点路径名，内核会以此作为默认终端。U - B o o t 在1 . 3 . 0 版本后添加了对扁平设备树F D T 的支持，U - B o o t 加载L i n u x 内核、R a m d i s k 文件系统（如果使用的话）和设备树二进制镜像到物理内存之后，在启动执行L i n u x 内核之前，它会修改设备树二进制文件。它会填充必要的信息到设备树中，例如M A C 地址、P C I 总线数目等。U - B o o t 也会填写设备树文件中的“/ c h o s e n ”节点，包含了诸如串口、根设备（R a m d i s k 、硬盘或N F S 启动）等相关信息。U - B o o t 源码c o m m o n / c m d _ b o o t m . c 的如下代码，显示了在执行内核代码前将调用f t _ s e t u p 函数填写设备树。

dts中最多的是SOC上的外设硬件配置，因为我在移植中为了保证原来原先依赖于arm框架的代码不变（没有使用FDT），模块driver中尽量不用设备树，所以dts中没有写外设硬件配置，这个有时间再去仔细研究。

3 kernel如何解析FDT

现在学习代码，已经不像刚毕业那会对于任何代码都会死抠细节，而是想观其大略，了解其框架，待需要细究时在仔细研究，我想这也是一种进步，能让自己在kernel星辰大海中更加从容一点。

学习代码，我一直追求弄明白原因（为什么这样做）和方法（如何做）。

首先来看dtc编译dts生成的dtb镜像文件是什么格式的。

1 设备树主要由三大部分组成：头（H e a d e r ）、结构块（S t r u c t u r e b l o c k ）、字符串块（S t r i n g s b l o c k ）。在内存中分配图如下：

头主要描述设备树的基本信息，如设备树魔数标志、设备树块大小、结构块的偏移地址等，其具体结构b o o t _ p a r a m _ h e a d e r 如下。这个结构中的值都是以大端模式表示，并且偏移地址是相对于设备树头的起始地址计算的。

/*
 * This is what gets passed to the kernel by prom_init or kexec
 *
 * The dt struct contains the device tree structure, full pathes and
 * property contents. The dt strings contain a separate block with just
 * the strings for the property names, and is fully page aligned and
 * self contained in a page, so that it can be kept around by the kernel,
 * each property name appears only once in this page (cheap compression)
 *
 * the mem_rsvmap contains a map of reserved ranges of physical memory,
 * passing it here instead of in the device-tree itself greatly simplifies
 * the job of everybody. It‘s just a list of u64 pairs (base/size) that
 * ends when size is 0
 */
struct boot_param_header {
    __be32  magic;          /* magic word OF_DT_HEADER */
    __be32  totalsize;      /* total size of DT block */
    __be32  off_dt_struct;      /* offset to structure */
    __be32  off_dt_strings;     /* offset to strings */
    __be32  off_mem_rsvmap;     /* offset to memory reserve map */
    __be32  version;        /* format version */
    __be32  last_comp_version;  /* last compatible version */
    /* version 2 fields below */
    __be32  boot_cpuid_phys;    /* Physical CPU id we‘re booting on */
    /* version 3 fields below */
    __be32  dt_strings_size;    /* size of the DT strings block */
    /* version 17 fields below */
    __be32  dt_struct_size;     /* size of the DT structure block */
};

2 结构块（structure block）

扁平设备树结构块是线性化的树形结构，和字符串块一起组成了设备树的主体，以节点形式保存目标板的

设备信息。在结构块中，节点起始标志为3 2 位常值宏O F _ D T _ B E G I N _ N O D E ，节点结束标志为宏O F _ D T _ E N D _ N O D E ；子节点定义在节点结束标志前。一个节点的基本结构如下所示：

1 . 节点起始标志O F _ D T _ B E G I N _ N O D E （即0 x 0 0 0 0 _ 0 0 0 1 ）;

2 . 节点路径或者节点单元名（v e r s i o n < 3 以及节点路径表示，v e r s i o n > 1 6 时以节点单元名表示）；

3 . 填充字节保证四字节对齐；

4 . 节点属性。每个属性以常值宏O F _ D T _ P R O P 开始，后面依次为属性值的字节长度、属性名在在字符串块

中的偏移值、属性值及字节对齐填充段；

5 . 如果存在子节点，则定义子节点。

6 . 节点结束标志O F _ D T _ E N D _ N O D E （即0 x 0 0 0 0 _ 0 0 0 2 ）。

归纳起来，一个节点可以概括为以O F _ D T _ B E G I N _ N O D E 开始，节点路径、属性列表、子节点列表以及

O F _ D T _ E N D _ N O D E 结束的序列，每一个子节点自身也是类似的结构。

3 字符串块（Strings block）

为了节省空间，对于那些属性名，尤其是很多属性名是重复冗余出现的，提取出来单独存放到字符串块。

这个块中包含了很多有结束标志的属性名字符串。在设备树的结构块中存储了这些字符串的偏移地址，因

为可以很容易的查找到属性名字符串。字符串块的引入节省嵌入式系统较为紧张的存储空间。

4 kernel如何解析FDT

我们利用dtc编译了dts文件生成dtb，那么kernel就会“反汇编”dtb，从而获取其中的配置信息，因此上面描述到的dtb文件存储格式都会在kernel的解析中体现出来。

dtb文件是独立于bootloader以及kernel存在的，dtb中的chosen节点需要uboot中进行填写，dtb镜像地址也由uboot传递给kernel，保存在r3寄存器中，但是由于我移植中dtb的chosen手动填写，并且不用uboot启动kernel，所以修改kernel启动代码，直接写死dtb的首地址，代码如下：

/* As with the other PowerPC ports, it is expected that when code
 * execution begins here, the following registers contain valid, yet
 * optional, information:
 *
 *   r3 - Board info structure pointer (DRAM, frequency, MAC address, etc.)
 *   r4 - Starting address of the init RAM disk
 *   r5 - Ending address of the init RAM disk
 *   r6 - Start of kernel command line string (e.g. "mem=128")
 *   r7 - End of kernel command line string
 *
 */
    __HEAD
_ENTRY(_stext);
_ENTRY(_start);
    /*
     * Reserve a word at a fixed location to store the address
     * of abatron_pteptrs
     */
    nop

    #device tree phy addr
    lis r3, 0x81000000@h
    ori r3, r3, 0x81000000@l

    mr  r31,r3      /* save device tree ptr */
    li  r24,0       /* CPU number */

PPC架构kernel对FDT解析可以分为两部分：

第一步是早期解析，获取kernel启动必需的cmdline以及cpu mem等信息。

第二步是后期的完全解析，以供driver加载时获取对应配置信息使用。

由于移植中尽量让driver不使用FDT，所以今天主要分析早期解析过程，进入start kernel之前调用machine init

在arch/powerpc/kernel/setup_32.c中，machine init则调用early init devtree完成早期设备树的解析，在arch/powerpc/kernel/prom.c,代码如下：

void __init early_init_devtree(void *params)
{
    phys_addr_t limit;

    /* Setup flat device-tree pointer */
    initial_boot_params = params;

#ifdef CONFIG_PPC_RTAS
    /* Some machines might need RTAS info for debugging, grab it now. */
    of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
#endif

#ifdef CONFIG_PPC_POWERNV
    /* Some machines might need OPAL info for debugging, grab it now. */
    of_scan_flat_dt(early_init_dt_scan_opal, NULL);
#endif

#ifdef CONFIG_FA_DUMP
    /* scan tree to see if dump is active during last boot */
    of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);
#endif

    /* Pre-initialize the cmd_line with the content of boot_commmand_line,
     * which will be empty except when the content of the variable has
     * been overriden by a bootloading mechanism. This happens typically
     * with HAL takeover
     */
    strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);

    /* Retrieve various informations from the /chosen node of the
     * device-tree, including the platform type, initrd location and
     * size, TCE reserve, and more ...
     */

    of_scan_flat_dt(early_init_dt_scan_chosen_ppc, cmd_line);

    /* Scan memory nodes and rebuild MEMBLOCKs */
    of_scan_flat_dt(early_init_dt_scan_root, NULL);
    of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);

    /* Save command line for /proc/cmdline and then parse parameters */
    strlcpy(boot_command_line, cmd_line, COMMAND_LINE_SIZE);
    parse_early_param();

    /* make sure we‘ve parsed cmdline for mem= before this */
    if (memory_limit)
        first_memblock_size = min(first_memblock_size, memory_limit);
    setup_initial_memory_limit(memstart_addr, first_memblock_size);
    /* Reserve MEMBLOCK regions used by kernel, initrd, dt, etc... */
    memblock_reserve(PHYSICAL_START, __pa(klimit) - PHYSICAL_START);
    /* If relocatable, reserve first 32k for interrupt vectors etc. */
    if (PHYSICAL_START > MEMORY_START)
        memblock_reserve(MEMORY_START, 0x8000);
    reserve_kdump_trampoline();
#ifdef CONFIG_FA_DUMP
    /*
     * If we fail to reserve memory for firmware-assisted dump then
     * fallback to kexec based kdump.
     */
    if (fadump_reserve_mem() == 0)
#endif
        reserve_crashkernel();
    early_reserve_mem();

    /*
     * Ensure that total memory size is page-aligned, because otherwise
     * mark_bootmem() gets upset.
     */
    limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
    memblock_enforce_memory_limit(limit);

    memblock_allow_resize();
    memblock_dump_all();

    DBG("Phys. mem: %llx\n", memblock_phys_mem_size());

    /* We may need to relocate the flat tree, do it now.
     * FIXME .. and the initrd too? */
    move_device_tree();

    allocate_pacas();

    DBG("Scanning CPUs ...\n");

    /* Retrieve CPU related informations from the flat tree
     * (altivec support, boot CPU ID, ...)
     */
    of_scan_flat_dt(early_init_dt_scan_cpus, NULL);

#if defined(CONFIG_SMP) && defined(CONFIG_PPC64)
    /* We‘ll later wait for secondaries to check in; there are
     * NCPUS-1 non-boot CPUs  :-)
     */
    spinning_secondaries = boot_cpu_count - 1;
#endif

    DBG(" <- early_init_devtree()\n");
}

调用of_scan_flat_dt来遍历dtb中所有节点，调用解析函数early_init_dt_scan_chosen_ppc early_init_dt_scan_mem_ppc early_init_dt_scan_root early_init_dt_scan_cpus，分别获取chosen mem cpus节点信息，完成早期cmdline mem cpu的操作。我们来看一个mem的解析函数，代码如下：

int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
                     int depth, void *data)
{
    unsigned long l;
    char *p;

    pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname);

    if (depth != 1 || !data ||
        (strcmp(uname, "chosen") != 0 && strcmp(uname, "[email protected]") != 0))
        return 0;

    early_init_dt_check_for_initrd(node);

    /* Retrieve command line */
    p = of_get_flat_dt_prop(node, "bootargs", &l);
    if (p != NULL && l > 0)
        strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));

    /*
     * CONFIG_CMDLINE is meant to be a default in case nothing else
     * managed to set the command line, unless CONFIG_CMDLINE_FORCE
     * is set in which case we override whatever was found earlier.
     */
#ifdef CONFIG_CMDLINE
#ifndef CONFIG_CMDLINE_FORCE
    if (!((char *)data)[0])
#endif
        strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
#endif /* CONFIG_CMDLINE */

    pr_debug("Command line is: %s\n", (char*)data);

    /* break now */
    return 1;
}

对于fdt的处理函数主要在arch/powerpc/kernel/prom.c以及driver/of/fdt.c中。

与之前文章分析tags解析方式对比，可以看出FDT的解析跟tags解析的差别之处在于，

tags是采用注册回调函数方式，解析什么类型tags，则调用该类型对应处理函数。

fdt是采用遍历整个设备树，在处理函数中判断是否是所需要解析的内容，然后进行处理。

时间： 2024-08-03 21:20:35

简析PPC的Device Tree机制

简析PPC的Device Tree机制的相关文章

Device Tree（一）：背景介绍

Android WebView远程代码执行漏洞简析

借助LANMT构架，简析ngnix的使用

CentOS的网络配置简析

[转载] Thrift原理简析(JAVA)

The Linux usage model for device tree data

TCP,UDP,IP 协议简析

Device Tree（二）：基本概念

cgroup原理简析:进程调度