malloc和free的内存到底有多大?——GNU glib库

大家应该都比较熟悉这一点:malloc分配的内存一定大于用户指定的大小!而且很多人也问过这样的问题:到底大多少?以及实际上malloc到底分配了多少?

我们知道这个大小一定在某个“神奇”地方记录着,但是就像自己的“思维”一样,你确无法感知!不过,这是错觉,只是我们习惯了只使用,而没有深入剖析源码,在这里我将揭开这个面纱,去掉其透明化!

声明:源码基于GNU glib库的2.7版本的malloc目录下相关文件

再声明:不同的C库实现方式不一定一样,这里是glib库,如果你想知道window的或者其他,请Alt + F4

摘要

malloc.c中开篇注释表达一种观点:这里的算法不一定是最好的,但是应该是普遍适用的

此文件包含的函数实现,以及Vital statistics,Alignment,Minimum/Maximum allocated size,最后注明:我是线程安全的,骚年call me!@[email protected]

/*
* Why use this malloc?

  This is not the fastest, most space-conserving, most portable, or
  most tunable malloc ever written. However it is among the fastest
  while also being among the most space-conserving, portable and tunable.
  Consistent balance across these factors results in a good general-purpose
  allocator for malloc-intensive programs.

  The main properties of the algorithms are:
  * For large (>= 512 bytes) requests, it is a pure best-fit allocator,
    with ties normally decided via FIFO (i.e. least recently used).
  * For small (<= 64 bytes by default) requests, it is a caching
    allocator, that maintains pools of quickly recycled chunks.
  * In between, and for combinations of large and small requests, it does
    the best it can trying to meet both goals at once.
  * For very large requests (>= KB by default), it relies on system
    memory mapping facilities, if supported.

  For a longer but slightly out of date high-level description, see
     http://gee.cs.oswego.edu/dl/html/malloc.html

  You may already by default be using a C library containing a malloc
  that is  based on some version of this malloc (for example in
  linux). You might still want to use the one in this file in order to
  customize settings or to avoid overheads associated with library
  versions.

* Contents, described in more detail in "description of public routines" below.

  Standard (ANSI/SVID/...)  functions:
    malloc(size_t n);
    calloc(size_t n_elements, size_t element_size);
    free(void* p);
    realloc(void* p, size_t n);
    memalign(size_t alignment, size_t n);
    valloc(size_t n);
    mallinfo()
    mallopt(int parameter_number, int parameter_value)

  Additional functions:
    independent_calloc(size_t n_elements, size_t size, void* chunks[]);
    independent_comalloc(size_t n_elements, size_t sizes[], void* chunks[]);
    pvalloc(size_t n);
    cfree(void* p);
    malloc_trim(size_t pad);
    malloc_usable_size(void* p);
    malloc_stats();

* Vital statistics:

  Supported pointer representation:       4 or 8 bytes
  Supported size_t  representation:       4 or 8 bytes
       Note that size_t is allowed to be 4 bytes even if pointers are 8.
       You can adjust this by defining INTERNAL_SIZE_T

  Alignment:                              2 * sizeof(size_t) (default)
       (i.e., 8 byte alignment with 4byte size_t). This suffices for
       nearly all current machines and C compilers. However, you can
       define MALLOC_ALIGNMENT to be wider than this if necessary.

  Minimum overhead per allocated chunk:   4 or 8 bytes
       Each malloced chunk has a hidden word of overhead holding size
       and status information.

  Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
			  8-byte ptrs:  24/32 bytes (including, 4/8 overhead)

       When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
       ptrs but 4 byte size) or 24 (for 8/8) additional bytes are
       needed; 4 (8) for a trailing size field and 8 (16) bytes for
       free list pointers. Thus, the minimum allocatable size is
       16/24/32 bytes.

       Even a request for zero bytes (i.e., malloc(0)) returns a
       pointer to something of the minimum allocatable size.

       The maximum overhead wastage (i.e., number of extra bytes
       allocated than were requested in malloc) is less than or equal
       to the minimum size, except for requests >= mmap_threshold that
       are serviced via mmap(), where the worst case wastage is 2 *
       sizeof(size_t) bytes plus the remainder from a system page (the
       minimal mmap unit); typically 96 or 8192 bytes.

  Maximum allocated size:  4-byte size_t: 2^32 minus about two pages
			   8-byte size_t: 2^ minus about two pages

       It is assumed that (possibly signed) size_t values suffice to
       represent chunk sizes. `Possibly signed' is due to the fact
       that `size_t' may be defined on a system as either a signed or
       an unsigned type. The ISO C standard says that it must be
       unsigned, but a few systems are known not to adhere to this.
       Additionally, even when size_t is unsigned, sbrk (which is by
       default used to obtain memory from system) accepts signed
       arguments, and may not be able to handle size_t-wide arguments
       with negative sign bit.  Generally, values that would
       appear as negative after accounting for overhead and alignment
       are supported only via mmap(), which does not have this
       limitation.

       Requests for sizes outside the allowed range will perform an optional
       failure action and then return null. (Requests may also
       also fail because a system is out of memory.)

  Thread-safety: thread-safe
*/

malloc的实现

void * __libc_malloc (size_t bytes)
{
	mstate ar_ptr;
	void *victim;

	void *(*hook) (size_t, const void *)
		= atomic_forced_read (__malloc_hook);
	if (__builtin_expect (hook != NULL, 0))
		return (*hook)(bytes, RETURN_ADDRESS (0));

	arena_get (ar_ptr, bytes);

	if (!ar_ptr)
		return 0;

	victim = _int_malloc (ar_ptr, bytes);
	if (!victim)
	{
		LIBC_PROBE (memory_malloc_retry, 1, bytes);
		ar_ptr = arena_get_retry (ar_ptr, bytes);
		if (__builtin_expect (ar_ptr != NULL, 1))
		{
			victim = _int_malloc (ar_ptr, bytes);
			(void) mutex_unlock (&ar_ptr->mutex);
		}
	}
	else
		(void) mutex_unlock (&ar_ptr->mutex);
	assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
			ar_ptr == arena_for_chunk (mem2chunk (victim)));
	return victim;
}
libc_hidden_def (__libc_malloc)

抛开细节看重点,这个函数只需要注意这两句代码即可,就是这两句

victim = _int_malloc (ar_ptr, bytes);
return victim;

换言之,__libc_malloc只是一个封装,真正完成分配任务的是函数_int_malloc。

_int_malloc()这函数非常大,上百行的样子,有兴趣的骚年,自己读去哈!

此函数的主要思想就是根据用户申请而指定的大小,做出不同的分配方案;具体怎么分配先不管喽,解决主要问题!——代码无边,重点是岸 ^_^!

其中有四句,一句定义,剩余三句共有的重要的代码

mchunkptr victim;
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;

victim的数据类型mchunkptr,它是个指针

typedef struct malloc_chunk* mchunkptr;

结构体struct malloc_chunk定义

struct malloc_chunk {
	size_t prev_size;			/* Size of previous chunk (if free).  */
	size_t size;				/* Size in bytes, including overhead. */

	struct malloc_chunk *fd;	/* double links -- used only if free. */
	struct malloc_chunk *bk;

	/* Only used for large blocks: pointer to next larger size.  */
	struct malloc_chunk *fd_nextsize;	/* double links -- used only if free. */
	struct malloc_chunk *bk_nextsize;
};

victim是特定分配算法分配的内存地址,其中已经包含分配的大小信息,chunk2mem是个宏; alloc_perturb此函数调用memset初始分配的内存——全部清零!

#define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))

其中SIZE_SZ还是个宏,迭代展开就是sizeof(size_t), size_t大家都熟悉,为什么加两个无符号整型大小呢?

看上面结构体struct malloc_chunk定义?秒懂?跳过的就是prev_size成员和size成员(两个成员的意义:看结构中注释)

此外,在_int_malloc中有这段注释,也很有价值

/*
     Convert request size to internal form by adding SIZE_SZ bytes
     overhead plus possibly more to obtain necessary alignment and/or
     to obtain a size of at least MINSIZE, the smallest allocatable
     size. Also, checked_request2size traps (returning 0) request sizes
     that are so large that they wrap around zero when padded and
     aligned.
  */

最后,_int_malloc执行返回偏移调整后的p,回到主调函数_lib_malloc中,然后_lib_malloc执行return victim;——即用户malloc得到的地址。

一目了然,大小的信息就藏在malloc返回地址的前面,即struct malloc_chunk结构体内

结构体其布局

/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast.  The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if allocated            | |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             User data starts here...                          .
	    .                                                               .
	    .             (malloc_usable_size() bytes)                      .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk                                     |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    Chunks always begin on even word boundaries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk                            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Forward pointer to next chunk in list             |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Back pointer to previous chunk in list            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Unused space (may be 0 bytes long)                .
	    .                                                               .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The two exceptions to all this are

     1. The special chunk `top' doesn't bother using the
	trailing size field since there is no next contiguous chunk
	that would have to index off it. After initialization, `top'
	is forced to always exist.  If it would become less than
	MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
	bit M (IS_MMAPPED) set in their size fields.  Because they are
	allocated one-by-one, each must contain its own trailing size field.

*/

free的实现

free过程是malloc过程逆过程,瞅瞅框架!常识告诉我们,free必须通过某种方式知道要释放的大小才能完成释放工作,因此开篇的问题,将在这里获得最终答案!

首先,这是实现代码

void __libc_free(void *mem)
{
	mstate ar_ptr;
	mchunkptr p;
	void (*hook) (void *, const void *)
		= atomic_forced_read(__free_hook);
	if (__builtin_expect(hook != NULL,)) {
		(*hook) (mem, RETURN_ADDRESS());
		return;
	}
	if (mem == 0)
		return;
	p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
	if (chunk_is_mmapped(p)) {
		if (!mp_.no_dyn_threshold
			&& p->size > mp_.mmap_threshold
			&& p->size <= DEFAULT_MMAP_THRESHOLD_MAX) {
			mp_.mmap_threshold = ((p)->size & ~(SIZE_BITS));
			mp_.trim_threshold = 2 * mp_.mmap_threshold;
			LIBC_PROBE(memory_mallopt_free_dyn_thresholds, 2,
					   mp_.mmap_threshold, mp_.trim_threshold);
		}
		munmap_chunk(p);
		return;
	}
	ar_ptr = (((p)-> size & 0x4) ?
			((heap_info *) ((unsigned long) (p) & ~((10 * 10) - 1)))->ar_ptr : &main_arena);
	_int_free(ar_ptr, p, 0);
}*/

框架:

0、动态分配的hook,参考gnu相关内容(google __free_hook),这里忽略它,虽然它代码的一大坨,但和讨论的问题无关! 飘过

1、传入空指针(free(NULL)),直接返回,这个比较熟悉,手册中经常见;

2、p = mem2chunk (mem);这是依据很关键的代码

这是个宏

#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))

是不是感觉好熟悉,是的,和上文malloc中的 chunk2mem对应的逆操作,上面是加(+),这里是减(-)

3、如果是映射方式分配的大内存,用解映射方式释放,不管它,飘过

4、最后两句

4.1 倒数第二句是宏,展开还是宏,以此继续迭代展开这个样子

ar_ptr = (((p)-> size & 0x4) ?
             ((heap_info *) ((unsigned long) (p) & ~((10 * 10) - 1)))->ar_ptr : &main_arena);

大致意思就是依据结构体中布局,检测size的第2位(从0计)是不是1,然后做相应的处理。没有深究,不影响问题的讨论,暂且不论了!(贴出来给深究的人,避免再一层层宏展开了)

4.2 最后一句,_int_free,它要完成释放的工作,因此_libc_free也是个封装

_int_free这也是一个不小的函数,飘过与问题无关的,抓住与问题相关的,只有一句,这一句将揭开问题的答案!

进此函数有一句

size = chunksize (p);

chunksize是个宏

#define PREV_INUSE 0x1
#define IS_MMAPPED 0x2
#define NON_MAIN_ARENA 0x4
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
#define chunksize(p)         ((p)->size & ~(SIZE_BITS))

5句话表达的意思就就是屏蔽掉结构体size成员的低3位,就得到chunk的大小了,chunk是什么——姑且翻译成内存块,NND,它就是malloc真实分配的大小。参见上面malloc_chunk details注释的chunk示意图!

测试

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h> 

struct malloc_chunk {
	size_t prev_size;
	size_t size;
	struct malloc_chunk *fd;
	struct malloc_chunk *bk;
	struct malloc_chunk *fd_nextsize;
	struct malloc_chunk *bk_nextsize;
};

typedef struct malloc_chunk *mchunkptr;

int main(int argc, char *argv[])
{
	void *mem;
	mchunkptr p;
	int ret;
	int i;

	for(i = 0; i < 10; ++i) {
		mem = malloc(i);

		p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
		printf("malloc size : %d; chunk size : %d\n", i, p->size & ~0x7);

		free(mem);

	}

	for(i = 0; i < 10; ++i) {
		srand(i);
		mem = malloc(ret = rand() % 1024);

		p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
		printf("malloc size : %d; chunk size : %d\n", ret, p->size & ~0x7);

		free(mem);
	}
	exit(0);

}

运行结果

如果malloc和free是你的开发产品中的性能瓶颈,可以自行实现malloc和free,据说很多公司这样做了!!!

大功告成!

最后,先要再次提醒这是glib C的。至于windows下其怎么malloc和free的,有兴趣自己研究吧!!!

时间: 2024-07-30 13:35:38

malloc和free的内存到底有多大?——GNU glib库的相关文章

Linux系统下深究一个malloc/brk/sbrk新内存后的page fault问题

有耳可听的,就应当听 -<马可福音> 周四的休假团建又没有去,不因别的,只因年前东北行休假太多了,想缓缓-不过真实原因也确实因为假期剩余无几了-思考了一些问题,写下本文. ??本文的缘起来自于和同事讨论一个关于缺页中断按需调页的讨论.真可谓是三人行必有我师,最近经常能从一些随意的比划或招架中悟出一丝意义,所以非常感谢周围的信息输出者!甚至从小小学校全员禁言的作业群里,我都能每天重温一首古诗词,然后循此生意,去故意制造另一种真实的意境,然后发个朋友圈?~ ??感谢大家的信息输入,每次收到的好玩的

C:malloc/calloc/realloc/alloca内存分配函数

原文地址:http://www.cnblogs.com/3me-linux/p/3962152.html calloc(), malloc(), realloc(), free(),alloca() 内存区域可以分为栈.堆.静态存储区和常量存储区,局部变量,函数形参,临时变量都是在栈上获得内存的,它们获取的方式都是由编译器自动执行的. 利用指针,我们可以像汇编语言一样处理内存地址,C 标准函数库提供了许多函数来实现对堆上内存管理,其中包括:malloc函数,free函数,calloc函数和rea

在dll里malloc/new/cvCreate分配内存,在exe里free/Releases释放内存时会出错。

写了个程序,在DLL中用malloc分配了一块内存,但是在exe程序中释放,结果程序crash,原因就是:其原因可能是堆被损坏,这也说明 TestMySticker.exe 中或它所加载的任何 DLL 中有 bug. 以下文字引用自 http://hi.baidu.com/huhe/blog/item/0b422edd1f1563d98c1029a3.html 一个模块一个堆,一个线程一个栈. dll里malloc的内存,在exe里free会出错. CRT(C运行时期库)不是使用进程缺省的堆来实

一个Java对象到底占多大内存?(转)

最近在读<深入理解Java虚拟机>,对Java对象的内存布局有了进一步的认识,于是脑子里自然而然就有一个很普通的问题,就是一个Java对象到底占用多大内存? 在网上搜到了一篇博客讲的非常好:http://yueyemaitian.iteye.com/blog/2033046,里面提供的这个类也非常实用: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

一个Java对象到底占多大内存

最近在读<深入理解Java虚拟机>,对Java对象的内存布局有了进一步的认识,于是脑子里自然而然就有一个很普通的问题,就是一个Java对象到底占用多大内存? 在网上搜到了一篇博客讲的非常好:http://yueyemaitian.iteye.com/blog/2033046,里面提供的这个类也非常实用: ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 3

Win10到底需要多大内存才够用,你知道吗

Win10带来了语音助手.通知中心.Edge浏览器以及安全上的诸多改进,依然有人因为界面不够华丽,隐私保护有潜在威胁,体验中容易卡顿.蓝屏等诸多问题而选择不升级.如果你已经升级Win10,玩硬件的用户一般在这时候就要考虑一个问题:Win10操作系统对配置有什么要求呢?官方给出的基本配置需求和实际使用又有什么区别呢?今天我们就带着这些问题来分析一下Win10专业版到底需要多大内存才够用? ▼ ——官方给出最低支持配置 除开内存和处理器,诸如硬盘空间也只要16GB-20GB,和Win7.Win8大同

malloc/calloc/realloc/alloca内存分配函数

calloc(), malloc(), realloc(), free(),alloca() 内存区域可以分为栈.堆.静态存储区和常量存储区,局部变量,函数形参,临时变量都是在栈上获得内存的,它们获取的方式都是由编译器自动执行的. 利用指针,我们可以像汇编语言一样处理内存地址,C 标准函数库提供了许多函数来实现对堆上内存管理,其中包括:malloc函数,free函数,calloc函数和realloc函数.使用这些函数需要包含头文件stdlib.h. 四个函数之间的有区别,也有联系,我们应该学会把

@清晰掉 malloc是如何分配内存的?

任何一个用过或学过C的人对malloc都不会陌生.大家都知道malloc可以分配一段连续的内存空间,并且在不再使用时可以通过free释放掉.但是,许多程序员对malloc背后的事情并不熟悉,许多人甚至把malloc当做操作系统所提供的系统调用或C的关键字.实际上,malloc只是C的标准库中提供的一个普通函数,而且实现malloc的基本思想并不复杂,任何一个对C和操作系统有些许了解的程序员都可以很容易理解. 这篇文章通过实现一个简单的malloc来描述malloc背后的机制.当然与现有C的标准库

一个Java对象到底有多大

经常遇到一个问题,需要在内存里缓存一批数据来提高效率(避免每次都读取DB).那问题来了,这些对象到底会占用多大内存呢,这直接决定了可以缓存多少条记录,以及上线之后是否会内存不够等问题. 来看几种解决方法. 测试 实践是检验真理的唯一标准!比如你要想cache10w条记录,那你就把10w条记录加载到内存,然后看看到底用了多少内存.至于怎么看内存花了多少,你可以 任务管理器 top Java Runtime类 blabla.... 我们来看看直接从Java程序里能获取到的Runtime. impor