《CS:APP》 chapter 7 Linking 笔记

Linking

Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.

7.1 Compiler Drivers

Most compilation systems provide acompiler driver that invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user.

The driver first runs the C preprocessor ( cpp ), which translates the C source file main.cinto an ASCII

intermediate file main.i:

这个很有意思

cpp [other arguments] main.c /tmp/main.i

Next, the driver runs the C compiler ( cc1 ), which translates main.iinto an ASCII assembly language file main.s.

cc1 /tmp/main.i main.c -O2 [other arguments] -o /tmp/main.s

Then, the driver runs the assembler (as), which translates main.sinto a relocatable object filemain.o:

as [other arguments] -o /tmp/main.o /tmp/main.s

The driver goes through the same process to generateswap.o. Finally, it runs the linker program ld, which combines main.oand swap.o, along with the necessary system object files, to create the executable object file p:

ld -o p [system object files and args] /tmp/main.o /tmp/swap.o

7.2 Static Linking

Static linkers such as the Unix ld program take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object
file that can be loaded and run.

To build the executable, the linker must perform two main tasks:

Symbol resolution.Object files define and reference symbols . The purpose of symbol resolution is to associate each symbol reference with exactly one
symbol definition.

Relocation. Compilers and assemblers generate code and data sections that start at address 0. The linkerrelocates these sections by associating a memory
location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location.

7.3 Object Files

Object files come in three forms:

Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile
time to create an executable object file.

Executable object file.Contains binary code and data in a form that can be copied directly into memory and executed.

Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run
time.

Object file formats vary from system to system.

The Unix Executable and Linkable Format(ELF). Although our discussion will focus on ELF, the basic concepts are similar, regardless of the particular format.

7.4 Relocatable Object Files

Figure 7.3 shows the format of a typical ELF relocatable object file. The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of
the ELF header contains information that allows a linker to parse and interpret the object file. This includes the size of the ELF header, the object file type

Linux下使用 readelf 命令对ELF格式的文件进行信息读取

typedef struct {
int name; /* String table offset */
int value; /* Section offset, or VM address */
int size; /* Object size in bytes */
char type:4, /* Data, func, section, or src file name (4 bits) */
binding:4; /* Local or global (4 bits) */
char reserved; /* Unused */
char section; /* Section header index, ABS, UNDEF, */
/* Or COMMON */
} Elf_Symbol;

对于hello world 程序的elf读取信息

There are 30 section headers, starting at offset 0x1178:

Section Headers:

[Nr] Name              Type             Address           Offset

Size              EntSize          Flags  Link  Info  Align

[ 0]                   NULL             0000000000000000  00000000

0000000000000000  0000000000000000           0     0     0

[ 1] .interp           PROGBITS         0000000000400238  00000238

000000000000001c  0000000000000000   A       0     0     1

[ 2] .note.ABI-tag     NOTE             0000000000400254  00000254

0000000000000020  0000000000000000   A       0     0     4

[ 3] .note.gnu.build-i NOTE             0000000000400274  00000274

0000000000000024  0000000000000000   A       0     0     4

[ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298

000000000000001c  0000000000000000   A       5     0     8

[ 5] .dynsym           DYNSYM           00000000004002b8  000002b8

0000000000000060  0000000000000018   A       6     1     8

[ 6] .dynstr           STRTAB           0000000000400318  00000318

000000000000003d  0000000000000000   A       0     0     1

[ 7] .gnu.version      VERSYM           0000000000400356  00000356

0000000000000008  0000000000000002   A       5     0     2

[ 8] .gnu.version_r    VERNEED          0000000000400360  00000360

0000000000000020  0000000000000000   A       6     1     8

[ 9] .rela.dyn         RELA             0000000000400380  00000380

0000000000000018  0000000000000018   A       5     0     8

[10] .rela.plt         RELA             0000000000400398  00000398

0000000000000048  0000000000000018   A       5    12     8

[11] .init             PROGBITS         00000000004003e0  000003e0

000000000000001a  0000000000000000  AX       0     0     4

[12] .plt              PROGBITS         0000000000400400  00000400

0000000000000040  0000000000000010  AX       0     0     16

[13] .text             PROGBITS         0000000000400440  00000440

00000000000001a4  0000000000000000  AX       0     0     16

[14] .fini             PROGBITS         00000000004005e4  000005e4

0000000000000009  0000000000000000  AX       0     0     4

[15] .rodata           PROGBITS         00000000004005f0  000005f0

0000000000000011  0000000000000000   A       0     0     4

[16] .eh_frame_hdr     PROGBITS         0000000000400604  00000604

0000000000000034  0000000000000000   A       0     0     4

[17] .eh_frame         PROGBITS         0000000000400638  00000638

00000000000000d4  0000000000000000   A       0     0     8

[18] .init_array       INIT_ARRAY       0000000000600e10  00000e10

0000000000000008  0000000000000000  WA       0     0     8

[19] .fini_array       FINI_ARRAY       0000000000600e18  00000e18

0000000000000008  0000000000000000  WA       0     0     8

[20] .jcr              PROGBITS         0000000000600e20  00000e20

0000000000000008  0000000000000000  WA       0     0     8

[21] .dynamic          DYNAMIC          0000000000600e28  00000e28

00000000000001d0  0000000000000010  WA       6     0     8

[22] .got              PROGBITS         0000000000600ff8  00000ff8

0000000000000008  0000000000000008  WA       0     0     8

[23] .got.plt          PROGBITS         0000000000601000  00001000

0000000000000030  0000000000000008  WA       0     0     8

[24] .data             PROGBITS         0000000000601030  00001030

0000000000000010  0000000000000000  WA       0     0     8

[25] .bss              NOBITS           0000000000601040  00001040

0000000000000008  0000000000000000  WA       0     0     4

[26] .comment          PROGBITS         0000000000000000  00001040

000000000000002a  0000000000000001  MS       0     0     1

[27] .shstrtab         STRTAB           0000000000000000  0000106a

0000000000000108  0000000000000000           0     0     1

[28] .symtab           SYMTAB           0000000000000000  000018f8

0000000000000618  0000000000000018          29    45     8

[29] .strtab           STRTAB           0000000000000000  00001f10

0000000000000236  0000000000000000           0     0     1

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings), l (large)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

.data: Initialized global C variables. Local C variables are maintained at run time on the stack, and do not appear in either the .data or .bsssections.

.bss: Uninitialized global C variables. This section occupies no actual space in the object file; it is merely a place holder. Object file formats distin-guish between initialized and uninitialized
variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file.

就贴这两个的说明,其他段看书或者wiki吧

7.5 Symbols and Symbol Tables

Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m. In the context of a linker, there are three different kinds of symbols:

Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond tononstatic C functions and global variables that
are defined withoutthe C staticattribute.

Global symbols that are referenced by modulem but defined by some other module. Such symbols are called externals and correspond to C functions and variables that are defined in other modules.

Local symbolsthat are defined and referenced exclusively by module m. Some local linker symbols correspond to C functions and global variables that are defined with the staticattribute.
These symbols are visible anywhere within modulem, but cannot be referenced by other modules. The sections in an object file and the name of the source file that corresponds to module m also get local symbols.

It is important to realize that local linker symbols are not the same as local program variables. The symbol table in .symtab does not contain any symbols that correspond to local nonstatic program variables.

7.6 Symbol Resolution

When the compiler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module, gener-ates a linker symbol table entry, and
leaves it for the linker to handle.

7.6.1 How Linkers Resolve Multiply Defined Global Symbols

Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.

Rule 1: Multiple strong symbols are not allowed.

Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol.

Rule 3: Given multiple weak symbols, choose any of the weak symbols.

这里仅仅给出主要的判断依据Rule,具体的demo书上讲的很好,还有跟着的习题都有。不一一贴出来了。

7.6.2 Linking with Static Libraries

In practice, all compilation systems provide a mechanism for packaging related object modules into a single file called a static library

A big disadvantage is that every executable file in a system would now contain a complete copy of the collection of standard functions, which would be extremely wasteful of
disk space.

Another big disadvantage is that any change to any standard function, no matter how small, would require the library developer to recompile the entire source file, a time-consuming
operation that would complicate the development and maintenance

of the standard functions.

Figure 7.7 summarizes the activity of the linker. The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without
any further linking at load time.

7.7 Relocation

Relocating sections and symbol definitions.In this step, the linker merges all sections of the same type into a new aggregate section of the same type.

Relocating symbol references within sections.In this step, the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses.

7.7.1 Relocation Entries

When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory. Nor does it know the locations of any externally defined functions or global variables
that are referenced by the module. So whenever the assembler encounters a reference to an object whose ultimate location is unknown

1 typedef struct {
2 int offset; /* Offset of the reference to relocate */
3 int symbol:24, /* Symbol the reference should point to */
4 type:8; /* Relocation type */
5 } Elf32_Rel;

7.9 Loading Executable Object Files

To run an executable object file p, we can type its name to the Unix shell’s command line:

unix> ./p

用户空间程序究竟怎么开始的,怎么结束的:

When the loader runs, it creates the memory image shown in Figure 7.13. Guided by the segment header table in the executable, it copies chunks of the executable into the code and data segments. Next, the loader
jumps to the pro-gram’s entry point, which is always the address of the _start symbol. The startup codeat the _start address is defined in the object file crt1.oand is the same for all C programs. Figure 7.14 shows the specific sequence of calls in the startup
code. After calling initialization routines from the .text and .init sections, the

startup code calls theatexitroutine, which appends a list of routines that should be called when the application terminates normally. The exitfunction runs the functions registered by atexit, and then returns control to the operating system by calling _exit
. Next, the startup code calls the application’s mainroutine, which begins executing our C code. After the application returns, the startup code calls the _exit routine, which returns control to the operating system

7.12 Position-Independent Code (PIC)

A key purpose of shared libraries is to allow multiple running processes to share the same library code in memory and thus save precious memory resources.

《CS:APP》 chapter 7 Linking 笔记

时间: 2024-10-30 12:13:13

《CS:APP》 chapter 7 Linking 笔记的相关文章

《深入理解计算机系统》 Chapter 7 读书笔记

<深入理解计算机系统>Chapter 7 读书笔记 链接是将各种代码和数据部分收集起来并组合成为一个单一文件的过程,这个文件可被加载(货被拷贝)到存储器并执行. 链接的时机 编译时,也就是在源代码被翻译成机器代码时 加载时,也就是在程序被加载器加载到存储器并执行时 运行时,由应用程序执行 链接器使分离编译称为可能. 一.编译器驱动程序 大部分编译系统提供编译驱动程序:代表用户在需要时调用语言预处理器.编译器.汇编器和链接器. 1.将示例程序从ASCⅡ码源文件翻译成可执行目标文件的步骤 (1)运

《Linux内核设计与实现》Chapter 2 读书笔记

<Linux内核设计与实现>Chapter 2 读书笔记 一.获取内核源码 1.使用Git 我们曾经在以前的学习中使用过Git方法 $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 更新分支到Linux的最新分支 $ git pull 可以获取并随时保持与内核官方的代码树一致 2.安装内核源代码 压缩形式为bzip2 $ tar xvjf linux-x.y.z.tar.bz2 压缩

《Linux内核设计与实现》Chapter 5 读书笔记

<Linux内核设计与实现>Chapter 5 读书笔记 在现代操作系统中,内核提供了用户进程与内核进行交互的一组接口,这些接口的作用是: 使应用程序受限地访问硬件设备 提供创建新进程与已有进程进行通信的机制 提供申请操作系统其他资源的能力 一.与内核通信 1.系统调用的作用 系统调用在用户空间进程和硬件设备之间添加了一个中间层,作用是: 为用户空间提供了一种硬件抽象接口: 系统调用保证了系统的稳定和安全,即可以避免应用程序不正确地使用硬件设备,窃取其他进程的资源: 每个进程都运行在虚拟系统中

《Linux内核设计与实现》Chapter 3 读书笔记

<Linux内核设计与实现>Chapter 3 读书笔记 进程管理是所有操作系统的心脏所在. 一.进程 1.进程就是处于执行期的程序以及它所包含的资源的总称. 2.线程是在进程中活动的对象. 3.进程提供两种虚拟机制:虚拟处理器和虚拟内存. 4.内核调度的对象是线程,而不是进程. 二.进程描述符及任务结构 内核把进程的列表存放在叫做任务队列的双向循环链表中.链表中的每一项都是类型为task_struct的进程描述符结构,该结构定义在<linux/sched.h>文件中. 1.分配进

Computer Science - CS:APP - 2.1 信息存储

CS:APP - 2.1 信息存储 未知: 新知: 字长指明指针数据的标称大小.字长决定的最重要的系统参数就是虚拟地址空间的最大大小 char类型也能被用来存储整数值 使用确定大小的整数类型是程序员准确控制数据表示的最佳途径 面向普通用户的机器中排列表示一个对象字节的模式是小端模式 字节顺序在以下情景中会成为问题: 在不同类型的机器之间通过网络传送二进制数据 阅读表示整数的数据的字节序列 编写规避正常的类型系统的程序 原文地址:https://www.cnblogs.com/samaritan-

深入理解计算机系统 (CS:APP) Lab2 - Bomb Lab 解析

原文地址:https://billc.io/2019/04/csapp-bomblab/ 写在前面 CS:APP是这学期的一门硬核课程,应该是目前接触到最底层的课程了.学校的教学也是尝试着尽量和CMU同步,课件和习题都直接照搬原版.包括现在着手的第二个实验室Bomb Lab.这个lab很有意思,没有提供全部c语言代码,需要手动根据反汇编语言推测在每一个阶段需要输入的内容,输入正确就可以进入下一个阶段. 理论上每个人获取到的lab都是不一样的,但对于自学学生而言在官网http://csapp.cs

ubuntu12.04 安装CS:APP Y86模拟器

下的第一UBUNTU12.04下Y86模拟器的安装:(參考http://archive.cnblogs.com/a/1865627/ 作适当改动) 1.安装bison和flex词法分析工具 sudo apt-get install bison flex 2.下载sim解压.地址http://csapp.cs.cmu.edu/public/students.html Chapter 4: Processor Architecture Y86 tools and documentation Sour

Html5 Web App 手机跨平台开发笔记

APP 开发平台包括Android 平台开发,Mac os X 平台开发以及Windows Phone 7平台开发.开发的程序都只能在各自手机系统上运行,如果开发出一种程序,能再以上任何系统上运行,那是多么美好的事情.而Html5 Mobile Web App就是其中一种跨平台方法.下面是相关知识的介绍 1.背景 HTML5是HTML的最新标准,HTML5的草案已经于2008年发布,目前W 3 C(万维网联盟)正在对此进行进一步完善.对许多人来说,早该进行这种改进了.十多年来,HTML一直没有进

App.config的学习笔记

昨天基本弄清config的使用之后,再看WP的API,晕了.结果WP不支持system.configuration命名空间,这意味着想在WP上用App.config不大可能了. WP具体支持API请查看 .net WP API API reference 不过还是记录下App.config的使用. 有很大部分是从MSDN学来的,如果有人看我的这篇文章的话可以先去看看MSDN的相关章节 http://msdn.microsoft.com/en-us/library/system.configura