程序的机器级表示(五)

Procedure Example

准备调用swap_add之前的代码:

此时,ebp指向顶部,esp指向中部,call之后push return address

要改变ebp之前必须保存,以便之后恢复:

setup code for swap_add:

此时,ebp指向中部,esp指向中部

body code for swap_code:

finishing code for swap_code:

关于这一句ret的影响:
resetting the stack pointer so that it points to the stored return address, so that the ret instruction transfers control back to caller.

code for caller:

leave的影响:

Observe the use of the leave instruction to reset both the stack and the frame pointer prior to return. We have seen in our code examples that the code generated by gcc sometimes uses a leave instruction to deallocate a stack frame, and sometimes it uses one or two popl instructions. Either approach is acceptable。

leave =

总结:

We can see from this example that the compiler generates code to manage the stack structure according to a simple set of conventions. Arguments are passed to a function on the stack, where they can be retrieved using positive offsets (+8, +12, . . .) relative to %ebp. Space can be allocated on the stack either by using push instructions or by subtracting offsets from the stack pointer. Before returning, a function must restore the stack to its original condition by restoring any callee-saved registers and %ebp, and by resetting %esp so that it points to the return address.

Recursive Procedures :

We can see that calling a function recursively proceeds just like any other function call. Our stack discipline provides a mechanism where each invocation of a function has its own private storage for state information (saved values of the return location, frame pointer, and callee-save registers). If need be, it can also provide storage for local variables. The stack discipline of allocation and deallocation naturally matches the call-return ordering of functions. This method of implementing function calls and returns even works for more complex patterns, including mutual recursion (for example, when procedure P calls Q, which in turn calls P).

Data Alignment :

好处:

Many computer systems place restrictions on the allowable addresses for the primitive data types, requiring that the address for some type of object must be a multiple of some value K (typically 2, 4, or 8). Such alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system. For example, suppose a processor always fetches 8 bytes from memory with an address that must be a multiple of 8. If we can guarantee that any double will be aligned to have its address be a multiple of 8, then the value can be read or written with a single memory operation. Otherwise, we may need to perform two memory accesses, since the object might be split across two 8-byte memory blocks.

传统:

The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g., short) must have an address that is a multiple of 2, while any larger data types (e.g., int, int *, float, and double) must have an address that is a multiple of 4. Note that this requirement means that the least significant bit of the address of an object of type short must equal zero. Similarly, any object of type int, or any pointer, must be at an address having the low-order 2 bits equal to zero.

A case of mandatory alignment :

some of the SSE instructions for implementing multimedia operations will not work correctly with unaligned data. These instructions operate on 16-byte blocks of data, and the instructions that transfer data between the SSE unit and memory require the memory addresses to be multiples of 16. Any attempt to access memory with an address that does not satisfy this alignment will lead to an exception.

This is the motivation behind the IA32 convention of making sure that every stack frame is a multiple of 16 bytes long

.align 4

This ensures that the data following it (in this case the start of the jump table) will start with an address that is a multiple of 4. Since each table entry is 4 bytes long, the successive elements will obey the 4-byte alignment restriction.

细节:

Library routines that allocate memory, such as malloc, must be designed so that they return a pointer that satisfies the worst-case alignment restriction for the machine it is running on, typically 4 or 8. For code involving structures, the compiler may need to insert gaps in the field allocation to ensure that each structure element satisfies its alignment requirement. The structure then has some required alignment for its starting address.

对于:

struct S1{
    int i;
    char c;
    int j;
};

对齐后应为:

对于

struct S2{
    int I;
    int j;
    char c;
};

对齐后应为:

Machine code generated with higher levels of optimization

In our presentation, we have looked at machine code generated with level-one optimization (specified with the command-line option ‘-O1’). In practice, most heavily used programs are compiled with higher levels of optimization. For example, all of the GNU libraries and packages are compiled with level-two optimization, specified with the command-line option ‘-O2’.

Here are some examples of the optimizations that can be found at level two:

  • The control structures become more entangled. Most procedures have multiple return points, and the stack management code to set up and complete a function is intermixed with the code implementing the operations of the procedure.
  • Procedure calls are often inlined, replacing them by the instructions implementing the procedures. This eliminates much of the overhead involved in calling and returning from a function, and it enables optimizations that are specific to individual function calls. On the other hand, if we try to set a breakpoint for a function in a debugger, we might never encounter a call to this function.

Recursion is often replaced by iteration. (Calling a function is often replaced with a process like ‘while‘)

Out-of-Bounds Memory References and Buffer Overflow

We have seen that C does not perform any bounds checking for array references, and that local variables are stored on the stack along with state information such as saved register values and return addresses. This combination can lead to serious program errors, where the state stored on the stack gets corrupted by a write to an out-of-bounds array element. When the program then tries to reload the register or execute a ret instruction with this corrupted state, things can go seriously wrong.

A particularly common source of state corruption is known as buffer overflow. Typically some character array is allocated on the stack to hold a string, but the size of the string exceeds the space allocated for the array. This is demonstrated by the following program example:

1 /* Sample implementation of library function gets() */
2 char *gets(char *s)
3{
4 int c;
5 char *dest = s;
6 int gotchar = 0; /* Has at least one character been read? */
7 while ((c = getchar()) != ’\n’ && c != EOF) {
8 *dest++ = c; /* No bounds checking! */
9 gotchar = 1;
10 }
11 *dest++ = ’\0’; /* Terminate string */
12 if (c == EOF && !gotchar)
13 return NULL; /* End of file or error */
14 return s;
15 }

/* Read input line and write it back */
void echo() {
    char buf[8];  /* Way too small! */
    gets(buf);
    puts(buf);
}

如图所示:

In conclusion:

  • If the stored value of %ebx is corrupted, then this register will not be restored properly in line 12, and so the caller will not be able to rely on the integrity of this register, even though it should be callee-saved.
  • If the stored value of %ebp is corrupted, then this register will not be restored properly on line 13, and so the caller will not be able to reference its local variables or parameters properly.
  • If the stored value of the return address is corrupted, then the ret instruction (line 14) will cause the program to jump to a totally unexpected location.

A better version involves using the function fgets, which includes as an argument a count on the maximum number of bytes to read.

Unfortunately, a number of commonly used library functions, including strcpy, strcat, and sprintf, have the property that they can generate a byte sequence without being given any indication of the size of the destination buffer [94]. Such conditions can lead to vulnerabilities to buffer overflow.

A more pernicious use of buffer overflow is to get a program to perform a function that it would otherwise be unwilling to do. This is one of the most common methods to attack the security of a system over a computer network. Typically, the program is fed with a string that contains the byte encoding of some executable code, called the exploit code, plus some extra bytes that overwrite the return address with a pointer to the exploit code. The effect of executing the ret instruction is then to jump to the exploit code.

攻击原理:

In one form of attack, the exploit code then uses a system call to start up a shell program, providing the attacker with a range of operating system functions. In another form, the exploit code performs some otherwise unauthorized task, repairs the damage to the stack, and then executes ret a second time, causing an (apparently) normal return to the caller.

对策:

Any interface to the external environment should be made “bullet proof” so that no behavior by an external agent can cause the system to misbehave.

Thwarting Buffer Overflow Attacks

1 Stack Randomization

In order to insert exploit code into a system, the attacker needs to inject both the code as well as a pointer to this code as part of the attack string.

核心与实现方法:

The idea of stack randomization is to make the position of the stack vary from one run of a program to another. Thus, even if many machines are running identical code, they would all be using different stack addresses. This is implemented by allocating a random amount of space between 0 and n bytes on the stack at the start of a program, for example, by using the allocation function alloca, which allocates space for a specified number of bytes on the stack. This allocated space is not used by the program, but it causes all subsequent stack locations to vary from one execution of a program to another. The allocation range n needs to be large enough to get sufficient variations in the stack addresses, yet small enough that it does not waste too much space in the program.

举例:

运行结果解释:

Running the code 10,000 times on a Linux machine in 32-bit mode, the addresses ranged from 0xff7fa7e0 to 0xffffd7e0, a range of around 223. By comparison, running on an older Linux system, the same address occurred every time. Running in 64-bit mode on the newer machine, the addresses ranged from 0x7fff00241914 to 0x7ffffff98664, a range of nearly 232.

关于ASLR:

It is one of a larger class of techniques known as address-space layout randomization, or ASLR [95]. With ASLR, different parts of the program, including program code, library code, stack, global variables, and heap data, are loaded into different regions of memory each time a program is run. That means that a program running on one machine will have very different address mappings than the same program running on other machines. This can thwart some forms of attack.

隐忧:

If we set up a 256-byte nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses, which is entirely feasible for a determined attacker. For the 64-bit case, trying to enumer- ate 224 = 16,777,216 is a bit more daunting. We can see that stack randomization and other aspects of ASLR can increase the effort required to successfully attack a system, and therefore greatly reduce the rate at which a virus or worm can spread, but it cannot provide a complete safeguard.

Stack Corruption Detection

A second line of defense is to be able to detect when a stack has been corrupted.

实现方法:

Recent versions of gcc incorporate a mechanism known as stack protector into the generated code to detect buffer overruns. The idea is to store a special canary value4 in the stack frame between any local buffer and the rest of the stack state, as illustrated in Figure 3.33 [32, 94]. This canary value, also referred to as a guard value, is generated randomly each time the program is run, and so there is no easy way for an attacker to determine what it is. Before restoring the register state and returning from the function, the program checks if the canary has been altered by some operation of this function or one that it has called. If so, the program aborts with an error.

命令细节:

In fact, for our earlier demonstration of stack overflow, we had to give the command-line option “-fno-stack-protector” to prevent gcc from inserting this code.

Limiting Executable Code Regions

非常关键:

A final step is to eliminate the ability of an attacker to insert executable code into a system. One method is to limit which memory regions hold executable code. In typical programs, only the portion of memory holding the code generated by the compiler need be executable. The other portions can be restricted to allow just reading and writing. As we will see in Chapter 9, the virtual memory space is logically divided into pages, typically with 2048 or 4096 bytes per page. The hardware supports different forms of memory protection, indicating the forms of access allowed by both user programs and by the operating system kernel. Many systems allow control over three forms of access: read (reading data from memory), write (storing data into memory), and execute (treating the memory contents as machine-level code).

Combining assembly code with C programs :

原因:

Although a C compiler does a good job of converting the computations we express in a program into machine code, there are some features of a machine that cannot be accessed by a C program.

方法:

There are two ways to incorporate assembly code into C programs. First, we can write an entire function as a separate assembly-code file and let the assembler and linker combine this with code we have written in C. Second, we can use the inline assembly feature of gcc, where brief sections of assembly code can be incorporated into a C program using the asm directive. This approach has the advantage that it minimizes the amount of machine-specific code.

限制:

Of course, including assembly code in a C program makes the code specific to a particular class of machines (such as IA32), and so it should only be used when the desired feature can only be accessed in this way.

原文地址:https://www.cnblogs.com/geeklove01/p/9177271.html

时间: 2024-10-28 05:23:23

程序的机器级表示(五)的相关文章

深入理解计算机系统之程序的机器级表示部分学习笔记

不论我们是在用C语言还是用JAVA或是其他的语言编程时,我们会被屏蔽了程序的机器级的实现.机器语言不需要被编译,可以直接被CPU执行,其执行速度十分  快.但是机器语言的读写性与移植性较高级语言低.高级语言被编译后便成为了汇编语言,汇编语言十分接近机器语言.之后汇编代码会转化为机器语言.虽然现代  的编译器能帮助我们将高级语言转化为汇编语言,解决了不少问题,但是对于一个严谨的程序员来说,需要做到能够阅读和理解汇编语言.我们主要围绕Intel来讲  解. 一  Intel处理器的历史演变 Inte

第三章 程序的机器级表示

程序的机器级表示 3.1历史观点 8086—〉80286—〉i386—〉i486—〉Pentium—〉PentiumPro—〉Pentium—〉Pentium—〉Pentium4—〉Pentium4e—〉Core 2 Duo —〉Core i7 3.2程序编码 1.gcc -01 –o p p1.c p2.c      使用第一级优化 2.程序计数器(%eip)指示将要执行的下一条指令在存储器中的地址. 3.寄存器文件 4.-S:C语言编译器产生的汇编代码 例:gcc -01 –S code.c

深入理解计算机系统(第二版)----之三:程序的机器级表示

计算机执行机器代码,用字节编码低级的操作,包括处理数据.管理存储器.读写存储设备上的数据,利用网络通信,编译器基于变成语言的原则, 目标机器的指令集合操作系统遵循的原则,经过一系列阶段产生机器代码,gcc c语言编辑器以汇编代码的形式输出,汇编代码是机器代码的文本表示,给出程序的每一条指令.然后gcc调用汇编器和链接器,根据汇编代码生成可执行的机器代码. 本章,近距离观察机器代码和汇编代码. 机器级的实现,被高级语言屏蔽了,用高级语言编写的程序可以在很多不同的机器上编译和执行,而汇编代码则是与特

第三章程序的机器级表示 学习报告

第三章 程序的机器级表示 3.1 历史观点 Intel处理器系列俗称x86,开始时是第一代单芯片.16位微处理器之一. 每个后继处理器的设计都是后向兼容的——较早版本上编译的代码可以在较新的处理器上运行. X86 寻址方式经历三代: 1  DOS时代的平坦模式,不区分用户空间和内核空间,很不安全 2  8086的分段模式 3  IA32的带保护模式的平坦模式 3.2 程序编码 gcc -01 -o p p1.c -01 表示使用第一级优化.优化的级别与编译时间和最终产生代码的形式都有关系,一般认

CSAPP:第三章程序的机器级表示2

CSAPP:程序的机器级表示2 关键点:算术.逻辑操作 算术逻辑操作1.加载有效地址2.一元二元操作3.移位操作 算术逻辑操作 ??如图列出了x86-64的一些整数和逻辑操作,大多数操作分成了指令类(只有leaq没有其他的变种,addb.addw.addl.addq分别是字节加法.字加法.双字加法和四字加法),这些操作通常分为四组:加载有效地址.一元操作.二元操作和移位操作. 1.加载有效地址 leaq S,D;D = &S??加载有效地址指令leag实际上是movq指令的变形,它的指令形式上是

程序的机器级表示——基础

计算机执行的是机器代码,机器代码是二进制文件,既程序.机器代码用字节(1Byte=8bit)序列编码低级的操作,例如数据处理,管理存储器,从存储设备取数据等.使用高级语言(c,c++等)编写的程序(文本形式)最终需要被编译成机器代码才可以被计算机执行.当使用GCC c编译器来编译c语言代码时,会首先将其编译成汇编语言形式的内容,然后c编译器调用汇编器和连接器来最终形成计算机可执行的机器代码.汇编语言代码是机器码的文本形式,是机器码(字节序列)的文本助记形式.现在的要求是可以理解经过编译器优化过的

六星经典CSAPP-笔记(3)程序的机器级表示

1.前言 IA32机器码以及汇编代码都与原始的C代码有很大不同,因为一些状态对于C程序员来说是隐藏的.例如包含下一条要执行代码的内存位置的程序指针(program counter or PC)以及8个寄存器.还要注意的一点是:汇编代码的ATT格式和Intel格式.ATT格式是GCC和objdump等工具的默认格式,在CSAPP中一律使用这种格式.而Intel格式则通常会在Intel的IA32架构文档以及微软的Windows技术文档中碰到.两者的主要区别有: Intel格式忽略指令中暗示操作数长度

C语言程序的机器级表示

过程调用的机器级表示 特别说明该表示是基于IA-32指令系统,x86 64指令系统不同于IA-32 机器级表示 可执行文件的存储器映像 调用过程 IA-32的寄存器使用约定 – 调用者保存寄存器:EAX.EDX.ECX 当过程P调用过程Q时,Q可以直接使用这三个寄存器,不用 将它们的值保存到栈中.如果P在从Q返回后还要用这三个寄 存器的话,P应在转到Q之前先保存,并在从Q返回后先恢复 它们的值再使用. – 被调用者保存寄存器:EBX.ESI.EDI Q必须先将它们的值保存到栈中再使用它们,并在返

程序的机器级表示

P104,p105:     X86,经历了一个长期的的不断发展的过程.开始时他是第一代单芯片和16位微处理器之一,由于当时集成电路技术水平十分有限,其中做了很多妥协.此后,他不断地成长,利用进步的技术满足更高性能和支持更高级的操作系统的需求. 8086(1978) 80286(1982) i386(1985) i486(1989) Pentium(1993) PentiumPro(1995) Pentium II(1997) Pentium III(1999) Pentium 4(2000)