虚拟内存地址(翻译)

Virtual Memory and Memory Mapping

Virtual memory was introduced in 1959. The main goal was to significantly simplify programming by abstracting the quirks of physical memory‘s hierarchy. Nowadays, virtual memory management is an essential component in almost every operating system‘s kernel. Today, I discuss the main concepts of virtual memory and its advantages.

虚拟内存于1959年引入,通过抽象物理内存层次的精髓使得编程更为简单。现在,虚拟内存管理在每一个操作系统内核上都是必须的。

Real Addressing and Virtual Addressing

真实地址与虚拟地址

In the olden days, a process would access physical memory directly. This model, known as real addressing, is still in use in embedded systems and some PDAs. It‘s also used in DOS. However, real addressing mode is problematic for several reasons:

在古老的混沌纪时代,进程可以直接访问物理内存。这种模式,被称为真实地址,在嵌入式系统与某些PDA中仍然使用;在DOS中也被使用着。只是,真实地址模型在出于以下原因可能出问题

  • System stability. A process may accidentally tread on another process‘s memory. This can occur, for instance, when a process tries to write to an uninitialized pointer whose value belongs to the address space of a different process. Worse yet, a process could overwrite critical kernel code, thus freezing the system.
  • 系统的稳定性。某一进程可能使用了其它进程的内存,比如一个进程向未初始化的指针地址写东西,但是这个地址属于其它进程。甚至有可能发生 进程操作了关键的内核代码,导致系统死机
  • Security. Apart from accidental trespassing, real addressing enables a process to steal and modify other processes‘ memory.
  • 安全。除了上述原因,真实地址模式使得一个进程可以修改其它进程的内存
  • Performance. In real mode addressing, swapping and memory mapping (I‘ll explain these techniques shortly) are usually unsupported. Consequently, the system isn‘t scalable. Crashes and freezes are common annoyances in such systems.
  • 性能。真实地址模式中交换和内存映射通常不被支持,导致系统是不可扩展的。崩溃和死机在这类系统中是常见现象

How Virtual Addressing Works

In virtual addressing mode, every program is allocated a private address space that is completely isolated from other processes‘ address spaces. For example, if address 0x4000aef6 in process A points to a certain variable, the same address in process B may point to some other data variable or it may not exist in B. A virtual memory address is, therefore, an alias; the physical location to which it points is hidden from the process and from the programmer. For example, address 0x4000aef6 can be mapped to the physical address 0x102788ff in the system‘s RAM, or to offset 0 from cluster 1740 on the system‘s hard disk. Under this model, each process sees the system‘s memory as if it were the sole process running on the machine.

在虚拟地址模式中,每一个程序都限制地址空间中,进程间的地址空间是隔绝的。比如,如果进程A中的0x4000aef6指向某一变量;进程B中的0x4000aef6可能指向其他数据变量,也有可能这个地址在B中不存在。所以,虚拟内存地址是一个别称;它所指向的物理级的真实位置对进程与程序人员来说是不可见的。比如地址0x4000aef6可以被映射到RAM的物理地址 0x102788ff或者磁盘的第1740簇上。在这种模型中,每个进程都认为自己掌控了系统的全部内存。

Early virtual memory managers used kernel modules to translate a virtual address to a physical address and vice versa. Today, this mapping is performed by special hardware, so it‘s completely transparent.

早期的虚拟内存管理使用系统模块转换虚拟地址与真实地址。现在,这种映射都是通过特殊的硬件完成,已经完全透明了。

Virtual addressing enables the kernel to load and unload memory dynamically. A frequently-used code section can be stored in the RAM, whereas other sections of the program can be mapped to a swap file. When the process accesses a memory address that is mapped to a swap file, the kernel reloads that portion to the system‘s RAM. From a user‘s point of view, there‘s no difference whatsoever between the two (except performance, of course).

虚拟地址使得内核可以动态的加载/卸载内存。程序中经常被使用的代码部分可以存在在RAM中,其他部分被应映射到交换文件中。当进程访问映射到交换文件中的内存地址时,内核重新加载这一部分到RAM中。从用户的角度来看,一直在RAM中与使用时在加载到RAM中,这两者是没有区别的。

Memory Mapping

Memory mapping is a common concept in POSIX and Windows systems. It enables a process to map disk files to its address space, treating them as memory blocks rather than files. Usually, the mapping consists of translating a disk location to a virtual address and vice versa; there is no copying of disk data into the RAM. Memory mapping has several uses:

内存映射在Windows系统与POSIX系统中是很常见的概念。它使得进程可以映射磁盘文件到进程本身的地址空间,并当作内存块而非文件处理。通常,映射就是对磁盘位置与虚拟内存地址进行转换。映射时,不会有磁盘数据到RAM的拷贝。内存映射有以下用处

  • Dynamic loading. By mapping executable files and shared libraries into its address space, a program can load and unload executable code sections dynamically.
  • 动态加载。通过映射可执行文件和共享库到地址空间,程序可以动态地加载/卸载可执行的代码部分
  • Fast File I/O. When you call file I/O functions, such as read() and write(), the data is copied to a kernel‘s intermediary buffer before it is transferred to the physical file or the process. This intermediary buffering is slow and expensive. Memory mapping eliminates this intermediary buffering, thereby improving performance significantly.
  • 快速的I/O。当调用I/O函数时,数据在被传输到进程中前,会先被拷贝到内核的临时缓存中。临时缓存小而慢。内存映射不会使用到临时缓存,因此,极大的提升了程序性能。
  • Streamlining file access. Once you map a file to a memory region, you access it via pointers, just as you would access ordinary variables and objects.
  • 文件访问流水化。一旦将文件映射到内存区,你可以通过指针访问,如同使用普通的变量或者对象一样
  • Memory persistence. Memory mapping enables processes to share memory sections that persist independently of the lifetime of a certain process.
  • 内存的持续性。内存映射使得进程可以共享生命期不依赖于特定进程的内存区域。

The POSIX <sys/mman.h> header includes memory mapping syscalls and data structures. Because this interface is more intuitive and simpler than that of Windows, I base my memory mapping example on the POSIX library.

The mmap() system call:

caddr_t mmap(caddress_t map_addr,
       size_t length,
       int protection,
       int flags,
       int fd,
       off_t offset);

Let‘s examine what each parameter means.

map_addr is the address to which the memory should be mapped. A NULL value allows the kernel to pick any address (normally you‘d use this value). length contains the number of bytes to map from the file. protection indicates the types of access allowed to the mapped region:

1 PROT_READ //the mapped region may be read
2 PROT_WRITE //the mapped region may be written
3 PROT_EXEC //the mapped region may be executed

flags contains various mapping attributes; for instance, MAP_LOCKED guarantees that the mapped region is never swapped.

fd is the mapped file‘s descriptor.

Finally, offset specifies the position in the file from which mapping should begin, with offset 0 indicating the file‘s beginning.

In the following example, the program maps the first 4 KB of a file passed in command line into its memory and then reads int value from it:

 1 #include <errno.h>
 2 #include <fcntl.h>
 3 #include <sys/mman.h>
 4 #include <sys/types.h>
 5
 6  int main(int argc, char *argv[])
 7  {
 8  int fd;
 9  void * pregion;
10  if (fd= open(argv[1], O_RDONLY) <0)
11  {
12  perror("failed on open");
13  return –1;
14  }
15  /*map first 4 kilobytes of fd*/
16  pregion=mmap(NULL, 4096, PROT_READ,MAP_SHARED,fd,0);
17  if (pregion==(caddr_t)-1)
18  {
19  perror("mmap failed")
20  return –1;
21  }
22  close(fd); //close the physical file because we don‘t need it
23  //access mapped memory; read the first int in the mapped file
24  int val= *((int*) pregion);
25 }

To unmap a mapped region, use the munmap() function:

int munmap(caddr_t addr, int length);

addr is the address of the region being unmapped. length specifies how much of the memory should be unmapped (you may unmap a portion of a previously-mapped region). The following example unmaps the first kilobyte of the previously-mapped file. The remaining three KB still remain mapped to the process‘s RAM after this call:

munmap(pregion, 1024);

Summary

Without virtual addressing, programming would be much more difficult than you‘d imagine. Furthermore, many common programming concepts such as dynamic memory allocation and multiple processing would be harder to implement. Virtual addressing thus proves once more that "there‘s no problem that can‘t be solved by an additional level of indirection."

Dynamic Memory Allocation and Virtual Memory

Every application running on your operating system has its unique address space, which it sees as a continuous block
of memory. In fact the memory is not physically continuous (it is fragmented), this is just the impression the operating
system gives to every program and it‘s called virtual memory. The size of the
virtual memory is the maximum size of the maximum size your computer can
address using pointers
(usually
on a 32-bit processor each process can address 4 GB of memory). The
natural question that arises is what happens when a process wants to
access more memory than your machine
physically has available as RAM? Due to having a virtual address space,
parts of the hard disk can be mapped together with real memory
and the process doesn‘t have to know anything about whether the address
is
physically stored in RAM or on the hard disk. The operating system
maintains a table, where virtual addresses are
mapped with their correspondent physical addresses, which is used
whenever a
request is made to read or write to a memory address.

Typically, in each process, the virtual memory available to that process is
called its address space. Each process‘s address space is
typically organized in 6 sections that are illustrated in the next
picture: environment section - used to store

environment variables and command line
arguments
; the stack, used to store memory for function arguments, return values, and

automatic variables; the heap (free store) used for dynamic allocation, two
data sections (for initialized and uninitialized static
and global variables) and a text section where the actual code is
kept.

The Heap

To understand why the dynamic memory
allocation
is time consuming let‘s take a closer look at what is actually happening. The
memory area where new gets its blocks of memory for allocation (usually called free store or heap) is illustrated in the

following picture:


When new is invoked, it starts looking for a free memory block that fits the size for your request. Supposing that such a
block of memory is found, it is marked as reserved and a pointer to that location is returned. There are several algorithms to
accomplish this because a compromise has to be made between scanning the whole memory for finding the smallest free block
bigger than the size of your object, or returning the first one where the memory needed fits. In order to improve the speed of
getting a block of memory, the free and reserved areas of memory are maintained in a data structure similar to binary trees
called a heap.
The various algorithms for finding free memory are beyond the
scope of this article and you can find a thorough discussion about them in D. Knuth‘s monograph The Art of Computer Programming -- Vol.1, Fundamental

Algorithms). This overhead combined with the risk for memory leaks makes the use of automatic memory (allocated on the
stack) preferred whenever possible and the allocation is not large.

How much Virtual Memory do you get

Even though every application has its own 4 GB (on 32-bit systems) of virtual
memory, that does not necessarily mean that your program can actually use all of that
memory. For example, on Windows, the upper 2 GB of that memory are allocated
to the operating system kernel, and are unavailable to the process.
(Therefore, any pointer starting with 0x8xxxxxxx is unavailable in user
space.) On Linux, the upper 1 GB is
kernel address space. Typically, operating systems provide means for
changing these defaults (such as the /3GB
switch on Windows
. It is rare, however, that you really want or
need to do so.

Address Space Fragmentation

Another concern with memory allocation is that if you allocate memory in
non-contiguous blocks, over time "holes" will develop. For example, if you
allocate 10 KB and it is taken from the middle of a 20 MB chunk of memory,
then you can no longer allocate that 20 MB a one chunk of memory. Doing this
enough times will cause you to no longer be able to allocate 20 MB at once.
This can cause allocation failures even when there is free memory. Note that
this is true even with virtual memory because what matters is that you need a
continuous block of addresses, not a continuous block of physical memory.

One way to address this problem is to avoid doing things that have
problems due to fragmentation, such as avoiding large
allocations--anything more than a few tens of MB is certainly asking for
trouble. Second, many heap implementations help you with this already by
allocating a large chunk of virtual address space and carving it up for you
(usually the heap allocates address space from the operating system and then
provides smaller chunks when requested).
But if you know that you will have a class that has a lot of small instances,
you could overload operator
new
and preallocate a large continuous chunk of memory, splitting off
small pieces for each class from that chunk.

时间: 2024-10-19 18:07:50

虚拟内存地址(翻译)的相关文章

虚拟存储器(1)——虚存概念及页、页表和地址翻译基础 虚拟存储器[转载]

转载于http://blog.csdn.net/u013471946/article/details/46890933 加了一些自己的理解 一.前言 虚拟存储器,感觉很难,至少说很复杂,里面涉及到的东西也比较枯燥.当然,如果能彻底搞清楚,对继续学习操作系统原理是百利无一害的. 玩C或C++的人,经常通过&a的方式获取变量地址,并将其赋值给指针变量,也通常用printf打印出地址的值,类似0x8048 034之类的地址值,但要从此刻开始要明确一点,你打印出的这个地址值,根本不是内存里的真实值,而是

系统常识——虚拟内存地址

虚拟内存是用硬盘充当内存使用,为文件分配一段虚拟内存地址,程序对这段地址的读写就由操作系统变为对文件的读写:虚拟地址空间是指每个Windows进程都有操作系统分配给它的独立地址空间(32位系统下是每个进程独立4GB内存地址),进程A的地址0x00001000可能映射到物理内存的0x00001234.也可能映射到文件(虚拟内存)的某个位置,进程B的地址0x00001000可能映射到物理内存的0x00002234,但无论如何,进程A.B同时存在时,地址0x00001000一定是映射到不同的物理位置的

LMA(装载内存地址)与VMA(虚拟内存地址)

      关于LMA和VMA,这个问题,有点点小复杂,不过,此处,我会把我的理解,尽量通过通俗的方式解释出来,以方便理解.当然,鄙人水平有限,难免有错,希望各位批评指正.       一般提及LMA和VMA,多数情况都是和ld,链接器相关的.在了解这两个名词的详细含义之前,有些基本知识和前提要说一下: [基础知识] 1.从你写的源代码到执行你的程序,一般经历了这几个过程:源代码编辑 -> 编译 -> 链接 -> 装载 -> 执行 2.编译,简单说就是用编译工具,将你的源码,变成可

四个名词(很常见):虚拟内存,虚拟内存地址(线性地址),物理内存,物理内存地址,逻辑地址

为什么会有虚拟内存和物理内存的区别? 正在运行的一个程序,它所需的内存是有可能大于内存条容量之和的,比如你的内存条是1G,但是你的程序需要2G的数据区,那么不是所有数据都能一起加载到内存(物理内存)中,势必有一部分数据要放到其他介质中(比如硬盘),待进程需要访问那部分数据时,再通过调度进入物理内存,所以虚拟内存是进程运行时所有内存空间的总和,他是远大于物理内存的一个虚拟存储空间,是将外存的一部分作为内存的扩展来使用,并且很多时候有一部分不在物理内存中,而物理内存就是我们平时所了解的内存条,有的地

端到端的地址翻译(虚拟地址是怎样取到相应高速缓存的数据的?)

[0]写在前面-为什么需要虚拟存储器? 0.1)定义:虚拟存储器其实就是借用了磁盘地址空间,还记得当初我们安装CentOS,划分的swap 文件系统吗? 0.2)VM简化了链接和加载.代码和数据共享,以及应用程序的存储器分配:(摘自CSAPP) (1) 简化链接: 每个进程都拥有独立的虚拟地址空间, 且空间范围一致:(它是可重定向目标文件使用相对物理地址的前提) (2) 简化加载: 加载器从不实际拷贝任何数据从磁盘到存储器.每个页初次被调用哦时, 要么是CPU取指时引用, 要么是一条正在执行的指

虚拟存储器(2)——端到端地址翻译与多级页表

一.端到端地址翻译示例 上节我们刚把TLB开了个头,多说无益,还是具体来玩个实际例子吧,具体来做一个端到端(虚拟地址到物理地址)的地址翻译示例,来统筹下之前讲的知识点.先来做如下约定: 1.老规矩,存储器按字节寻址,访问也按一字节访问: 2.虚拟地址14位长(n=14),物理地址12位长(m=12),位数上点玩起来方便: 3.页面大小是64字节(P=64),也就是说(p=6) 4.TLB是四路组相联,总共16个条目: 5.L1 d-cache是物理寻址.直接映射的,行大小为4字节,总共有16个组

Linux的内存分页管理【转】

内存是计算机的主存储器.内存为进程开辟出进程空间,让进程在其中保存数据.我将从内存的物理特性出发,深入到内存管理的细节,特别是了解虚拟内存和内存分页的概念. 内存 简单地说,内存就是一个数据货架.内存有一个最小的存储单位,大多数都是一个字节.内存用内存地址(memory address)来为每个字节的数据顺序编号.因此,内存地址说明了数据在内存中的位置.内存地址从0开始,每次增加1.这种线性增加的存储器地址称为线性地址(linear address).为了方便,我们用十六进制数来表示内存地址,比

Linux 的内存分页管理

架构师必读:Linux 的内存分页管理 来源:Linux爱好者 ID:LinuxHub 内存是计算机的主存储器.内存为进程开辟出进程空间,让进程在其中保存数据.我将从内存的物理特性出发,深入到内存管理的细节,特别是了解虚拟内存和内存分页的概念. 内存 简单地说,内存就是一个数据货架.内存有一个最小的存储单位,大多数都是一个字节.内存用内存地址(memory address)来为每个字节的数据顺序编号.因此,内存地址说明了数据在内存中的位置.内存地址从0开始,每次增加1.这种线性增加的存储器地址称

Linux的内存分页管理

内存是计算机的主存储器.内存为进程开辟出进程空间,让进程在其中保存数据.我将从内存的物理特性出发,深入到内存管理的细节,特别是了解虚拟内存和内存分页的概念. 内存 简单地说,内存就是一个数据货架.内存有一个最小的存储单位,大多数都是一个字节.内存用内存地址(memory address)来为每个字节的数据顺序编号.因此,内存地址说明了数据在内存中的位置.内存地址从0开始,每次增加1.这种线性增加的存储器地址称为线性地址(linear address).为了方便,我们用十六进制数来表示内存地址,比