Virtual Memory and Memory Mapping
Virtual memory was introduced in 1959. The main goal was to significantly simplify programming by abstracting the quirks of physical memory‘s hierarchy. Nowadays, virtual memory management is an essential component in almost every operating system‘s kernel. Today, I discuss the main concepts of virtual memory and its advantages.
虚拟内存于1959年引入,通过抽象物理内存层次的精髓使得编程更为简单。现在,虚拟内存管理在每一个操作系统内核上都是必须的。
Real Addressing and Virtual Addressing
真实地址与虚拟地址
In the olden days, a process would access physical memory directly. This model, known as real addressing, is still in use in embedded systems and some PDAs. It‘s also used in DOS. However, real addressing mode is problematic for several reasons:
在古老的混沌纪时代,进程可以直接访问物理内存。这种模式,被称为真实地址,在嵌入式系统与某些PDA中仍然使用;在DOS中也被使用着。只是,真实地址模型在出于以下原因可能出问题
- System stability. A process may accidentally tread on another process‘s memory. This can occur, for instance, when a process tries to write to an uninitialized pointer whose value belongs to the address space of a different process. Worse yet, a process could overwrite critical kernel code, thus freezing the system.
- 系统的稳定性。某一进程可能使用了其它进程的内存,比如一个进程向未初始化的指针地址写东西,但是这个地址属于其它进程。甚至有可能发生 进程操作了关键的内核代码,导致系统死机
- Security. Apart from accidental trespassing, real addressing enables a process to steal and modify other processes‘ memory.
- 安全。除了上述原因,真实地址模式使得一个进程可以修改其它进程的内存
- Performance. In real mode addressing, swapping and memory mapping (I‘ll explain these techniques shortly) are usually unsupported. Consequently, the system isn‘t scalable. Crashes and freezes are common annoyances in such systems.
- 性能。真实地址模式中交换和内存映射通常不被支持,导致系统是不可扩展的。崩溃和死机在这类系统中是常见现象
How Virtual Addressing Works
In virtual addressing mode, every program is allocated a private address space that is completely isolated from other processes‘ address spaces. For example, if address 0x4000aef6 in process A points to a certain variable, the same address in process B may point to some other data variable or it may not exist in B. A virtual memory address is, therefore, an alias; the physical location to which it points is hidden from the process and from the programmer. For example, address 0x4000aef6 can be mapped to the physical address 0x102788ff in the system‘s RAM, or to offset 0 from cluster 1740 on the system‘s hard disk. Under this model, each process sees the system‘s memory as if it were the sole process running on the machine.
在虚拟地址模式中,每一个程序都限制地址空间中,进程间的地址空间是隔绝的。比如,如果进程A中的0x4000aef6指向某一变量;进程B中的0x4000aef6可能指向其他数据变量,也有可能这个地址在B中不存在。所以,虚拟内存地址是一个别称;它所指向的物理级的真实位置对进程与程序人员来说是不可见的。比如地址0x4000aef6可以被映射到RAM的物理地址 0x102788ff或者磁盘的第1740簇上。在这种模型中,每个进程都认为自己掌控了系统的全部内存。
Early virtual memory managers used kernel modules to translate a virtual address to a physical address and vice versa. Today, this mapping is performed by special hardware, so it‘s completely transparent.
早期的虚拟内存管理使用系统模块转换虚拟地址与真实地址。现在,这种映射都是通过特殊的硬件完成,已经完全透明了。
Virtual addressing enables the kernel to load and unload memory dynamically. A frequently-used code section can be stored in the RAM, whereas other sections of the program can be mapped to a swap file. When the process accesses a memory address that is mapped to a swap file, the kernel reloads that portion to the system‘s RAM. From a user‘s point of view, there‘s no difference whatsoever between the two (except performance, of course).
虚拟地址使得内核可以动态的加载/卸载内存。程序中经常被使用的代码部分可以存在在RAM中,其他部分被应映射到交换文件中。当进程访问映射到交换文件中的内存地址时,内核重新加载这一部分到RAM中。从用户的角度来看,一直在RAM中与使用时在加载到RAM中,这两者是没有区别的。
Memory Mapping
Memory mapping is a common concept in POSIX and Windows systems. It enables a process to map disk files to its address space, treating them as memory blocks rather than files. Usually, the mapping consists of translating a disk location to a virtual address and vice versa; there is no copying of disk data into the RAM. Memory mapping has several uses:
内存映射在Windows系统与POSIX系统中是很常见的概念。它使得进程可以映射磁盘文件到进程本身的地址空间,并当作内存块而非文件处理。通常,映射就是对磁盘位置与虚拟内存地址进行转换。映射时,不会有磁盘数据到RAM的拷贝。内存映射有以下用处
- Dynamic loading. By mapping executable files and shared libraries into its address space, a program can load and unload executable code sections dynamically.
- 动态加载。通过映射可执行文件和共享库到地址空间,程序可以动态地加载/卸载可执行的代码部分
- Fast File I/O. When you call file I/O functions, such as read() and write(), the data is copied to a kernel‘s intermediary buffer before it is transferred to the physical file or the process. This intermediary buffering is slow and expensive. Memory mapping eliminates this intermediary buffering, thereby improving performance significantly.
- 快速的I/O。当调用I/O函数时,数据在被传输到进程中前,会先被拷贝到内核的临时缓存中。临时缓存小而慢。内存映射不会使用到临时缓存,因此,极大的提升了程序性能。
- Streamlining file access. Once you map a file to a memory region, you access it via pointers, just as you would access ordinary variables and objects.
- 文件访问流水化。一旦将文件映射到内存区,你可以通过指针访问,如同使用普通的变量或者对象一样
- Memory persistence. Memory mapping enables processes to share memory sections that persist independently of the lifetime of a certain process.
- 内存的持续性。内存映射使得进程可以共享生命期不依赖于特定进程的内存区域。
The POSIX <sys/mman.h> header includes memory mapping syscalls and data structures. Because this interface is more intuitive and simpler than that of Windows, I base my memory mapping example on the POSIX library.
The mmap() system call:
caddr_t mmap(caddress_t map_addr, size_t length, int protection, int flags, int fd, off_t offset);
Let‘s examine what each parameter means.
map_addr is the address to which the memory should be mapped. A NULL value allows the kernel to pick any address (normally you‘d use this value). length contains the number of bytes to map from the file. protection indicates the types of access allowed to the mapped region:
1 PROT_READ //the mapped region may be read 2 PROT_WRITE //the mapped region may be written 3 PROT_EXEC //the mapped region may be executed
flags contains various mapping attributes; for instance, MAP_LOCKED guarantees that the mapped region is never swapped.
fd is the mapped file‘s descriptor.
Finally, offset specifies the position in the file from which mapping should begin, with offset 0 indicating the file‘s beginning.
In the following example, the program maps the first 4 KB of a file passed in command line into its memory and then reads int value from it:
1 #include <errno.h> 2 #include <fcntl.h> 3 #include <sys/mman.h> 4 #include <sys/types.h> 5 6 int main(int argc, char *argv[]) 7 { 8 int fd; 9 void * pregion; 10 if (fd= open(argv[1], O_RDONLY) <0) 11 { 12 perror("failed on open"); 13 return –1; 14 } 15 /*map first 4 kilobytes of fd*/ 16 pregion=mmap(NULL, 4096, PROT_READ,MAP_SHARED,fd,0); 17 if (pregion==(caddr_t)-1) 18 { 19 perror("mmap failed") 20 return –1; 21 } 22 close(fd); //close the physical file because we don‘t need it 23 //access mapped memory; read the first int in the mapped file 24 int val= *((int*) pregion); 25 }
To unmap a mapped region, use the munmap() function:
int munmap(caddr_t addr, int length);
addr is the address of the region being unmapped. length specifies how much of the memory should be unmapped (you may unmap a portion of a previously-mapped region). The following example unmaps the first kilobyte of the previously-mapped file. The remaining three KB still remain mapped to the process‘s RAM after this call:
munmap(pregion, 1024);
Summary
Without virtual addressing, programming would be much more difficult than you‘d imagine. Furthermore, many common programming concepts such as dynamic memory allocation and multiple processing would be harder to implement. Virtual addressing thus proves once more that "there‘s no problem that can‘t be solved by an additional level of indirection."
Dynamic Memory Allocation and Virtual Memory
Every application running on your operating system has its unique address space, which it sees as a continuous block
of memory. In fact the memory is not physically continuous (it is fragmented), this is just the impression the operating
system gives to every program and it‘s called virtual memory. The size of the
virtual memory is the maximum size of the maximum size your computer can
address using pointers
(usually
on a 32-bit processor each process can address 4 GB of memory). The
natural question that arises is what happens when a process wants to
access more memory than your machine
physically has available as RAM? Due to having a virtual address space,
parts of the hard disk can be mapped together with real memory
and the process doesn‘t have to know anything about whether the address
is
physically stored in RAM or on the hard disk. The operating system
maintains a table, where virtual addresses are
mapped with their correspondent physical addresses, which is used
whenever a
request is made to read or write to a memory address.
Typically, in each process, the virtual memory available to that process is
called its address space. Each process‘s address space is
typically organized in 6 sections that are illustrated in the next
picture: environment section - used to store
environment variables and command line
arguments; the stack, used to store memory for function arguments, return values, and
automatic variables; the heap (free store) used for dynamic allocation, two
data sections (for initialized and uninitialized static
and global variables) and a text section where the actual code is
kept.
The Heap
To understand why the dynamic memory
allocation is time consuming let‘s take a closer look at what is actually happening. The
memory area where new gets its blocks of memory for allocation (usually called free store or heap) is illustrated in the
following picture:
When new is invoked, it starts looking for a free memory block that fits the size for your request. Supposing that such a
block of memory is found, it is marked as reserved and a pointer to that location is returned. There are several algorithms to
accomplish this because a compromise has to be made between scanning the whole memory for finding the smallest free block
bigger than the size of your object, or returning the first one where the memory needed fits. In order to improve the speed of
getting a block of memory, the free and reserved areas of memory are maintained in a data structure similar to binary trees
called a heap.
The various algorithms for finding free memory are beyond the
scope of this article and you can find a thorough discussion about them in D. Knuth‘s monograph The Art of Computer Programming -- Vol.1, Fundamental
Algorithms). This overhead combined with the risk for memory leaks makes the use of automatic memory (allocated on the
stack) preferred whenever possible and the allocation is not large.
How much Virtual Memory do you get
Even though every application has its own 4 GB (on 32-bit systems) of virtual
memory, that does not necessarily mean that your program can actually use all of that
memory. For example, on Windows, the upper 2 GB of that memory are allocated
to the operating system kernel, and are unavailable to the process.
(Therefore, any pointer starting with 0x8xxxxxxx is unavailable in user
space.) On Linux, the upper 1 GB is
kernel address space. Typically, operating systems provide means for
changing these defaults (such as the /3GB
switch on Windows. It is rare, however, that you really want or
need to do so.
Address Space Fragmentation
Another concern with memory allocation is that if you allocate memory in
non-contiguous blocks, over time "holes" will develop. For example, if you
allocate 10 KB and it is taken from the middle of a 20 MB chunk of memory,
then you can no longer allocate that 20 MB a one chunk of memory. Doing this
enough times will cause you to no longer be able to allocate 20 MB at once.
This can cause allocation failures even when there is free memory. Note that
this is true even with virtual memory because what matters is that you need a
continuous block of addresses, not a continuous block of physical memory.
One way to address this problem is to avoid doing things that have
problems due to fragmentation, such as avoiding large
allocations--anything more than a few tens of MB is certainly asking for
trouble. Second, many heap implementations help you with this already by
allocating a large chunk of virtual address space and carving it up for you
(usually the heap allocates address space from the operating system and then
provides smaller chunks when requested).
But if you know that you will have a class that has a lot of small instances,
you could overload operator
new and preallocate a large continuous chunk of memory, splitting off
small pieces for each class from that chunk.