Update 06/2018: Added thread-local caching (tcache
)
Introduction to glibc malloc
What is the heap? If you’ve taken an operating systems class before, you might recall that it is a free-floating region of memory that is managed by a memory allocator. When using a memory allocator to allocate memory, we call it dynamic memory allocation. In a C program, you must use the built-in functions malloc()
, calloc()
, realloc()
and free()
to invoke the memory allocator.
The function malloc()
is not the memory allocator – it invokes the memory allocator linked by the application at compile time. By default, a gcc
-compiled application uses the ptmalloc2
allocator in glibc
.
There are many other memory allocators out there: dlmalloc
, jemalloc
, tcmalloc
, etc., all of them claiming to be memory-efficient, fast, scalable and stable over long periods of time (which is important for server applications, which do run for very long periods of time). However, different allocators perform differently under different workloads, and for memory-intensive applications you might see a performance improvement when you switch allocators.
This article will try to give an overview of the structures in glibc
‘s malloc
while covering the essential algorithmic details of the allocation process. We will only be discussing the ptmalloc2
allocator in glibc
, which was based off the Doug Lea allocator (better known as dlmalloc
) and had fork()
support added to it. ptmalloc2
has been integrated into glibc
and any changes were made directly to the glibc
source tree.
High-level behavior
malloc
makes use of the brk/sbrk
and mmap
system calls to allocate the program heap. From the manpages:
brk()
andsbrk()
change the location of the program break, which defines the end of the process’s data segment (i.e., the program
break is the first location after the end of the uninitialized data
segment). Increasing the program break has the effect of allocating memory to the process; decreasing the break deallocates memory.
brk()
sets the end of the data segment to the value specified by addr, when that value is reasonable, the system has enough memory, and the process does not exceed its maximum data size (see setrlimit(2)).http://man7.org/linux/man-pages/man2/brk.2.html
sbrk()
increments the program’s data space by increment bytes. Callingsbrk()
with an increment of 0 can be used to find the current location of the program break.
So when does malloc
decide to use the heap space by incrementing the end-of-heap pointer with brk
/sbrk
, vs. reserving a region of memory using mmap
? The behavior is controlled via a parameter M_MMAP_THRESHOLD
which is set to 128 KiB
by default (which can be changed via mallopt()
in malloc.h
) If the requested size is less than 128 KiB
, malloc uses brk
/sbrk
– otherwise, it uses mmap
. To illustrate, consider the following code:
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
void print_map()
{
char sz_exec_cmd[64];
sprintf(&sz_exec_cmd, "cat /proc/%d/maps\0", getpid());
printf("Output of /proc/%d/maps:\n", getpid());
system(sz_exec_cmd);
printf("\n");
return;
}
int main()
{
printf("Process started with pid %d\n", getpid());
// Get current brk ptr with sbrk(0)
printf("Before calling brk, end of heap is at %p\n", sbrk(0));
// Increment brk ptr by 1 page size (4K)
brk(sbrk(0) + 4096); // Equiv to sbrk(4096);
printf("After +4096, end of heap is at %p\n", sbrk(0));
sbrk(-4096); // Equiv to brk(sbrk(0)-4096);
printf("After -4096, end of heap is at %p\n\n", sbrk(0));
print_map();
void *p1 = malloc(120*1024);
printf("Invoked malloc(120*1024).\n");
print_map();
free(p1);
void *p2 = malloc(140*1024);
printf("Invoked malloc(140*1024).\n");
print_map();
free(p2);
return 0;
}
Compile and run the program with the command
gcc -w -o malloc_syscall_demo malloc_syscall_demo.c && ./malloc_syscall_demo
The following printouts demonstrate the behavior of brk
/sbrk
after each invocation:
Process started with pid 2780 Before calling brk, end of heap is at 0x55555557a000 After +4096, end of heap is at 0x55555557b000 After -4096, end of heap is at 0x55555557a000
In the program virtual memory layout we can see the allocated heap that corresponds to the return value from sbrk(0)
:
Output of /proc/2780/maps: ... 555555559000-55555557a000 rw-p 00000000 00:00 0 [heap] 7ffff7df0000-7ffff7e15000 r--p 00000000 08:01 131130 /usr/lib/x86_64-linux-gnu/libc-2.30.so ...
This tells us that the virtual address (VA) range for the heap is 0x555555559000-0x55555557a000
, with the read, write and private flags set. Since no file is mapped here, we have a file offset of 00000000
and an inode number of 0
.
After the printout line “Invoked malloc(120*1024) for the first time.
“, we see no change in the process virtual memory layout since it fits into the allocated heap VA range. However, after the printout line “Invoked malloc(120*1024) for the second time.
” we see that the heap VA range has expanded to fit the two malloc
-ed blocks of memory:
Output of /proc/2780/maps: ... 555555559000-5555555b6000 rw-p 00000000 00:00 0 [heap] 7ffff7df0000-7ffff7e15000 r--p 00000000 08:01 131130 /usr/lib/x86_64-linux-gnu/libc-2.30.so ...
And after printout line “Invoked malloc(140*1024).
“, we see that a memory region has been allocated at a vastly different VA range:
Output of /proc/2780/maps: ... 555555559000-55555557a000 rw-p 00000000 00:00 0 [heap] 7ffff7dcc000-7ffff7df0000 rw-p 00000000 00:00 0 7ffff7df0000-7ffff7e15000 r--p 00000000 08:01 131130 /usr/lib/x86_64-linux-gnu/libc-2.30.so ...
This is because the requested memory is greater than the default M_MMAP_THRESHOLD
value.
Seems simple enough, right? Not quite. There are many more things going on under the hood…
Threading and Arenas
One of the main reasons why ptmalloc2
was integrated into glibc
as the default memory allocator was due to its support for threading, which helped improve memory allocator performance (and by extension, application performance).
Sometimes two threads want to malloc memory at the same time. If you’re lucky, they don’t step on each others’ toes, but… no one’s that lucky. So, malloc uses a lock to make the threads “take turns” accessing malloc’s internal structures. Still, taking turns means one thread is doing nothing for a while, which can’t be good for performance, and these locks themselves can take a significant amount of time as they often need to synchronize access across CPU caches and sometimes across physical CPUs. ptmalloc2
solves this by creating a separate heap segment and freelist data structures for each process thread.
Recall that
A free list is a data structure used in a scheme for dynamic memory allocation. It operates by connecting unallocated regions of memory together in a linked list, using the first word of each unallocated region as a pointer to the next.
https://en.wikipedia.org/wiki/Free_list
The term “arena” refers to separate set of freelists and heap segments. Let’s take a look at the following toy example:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
void print_map()
{
char sz_exec_cmd[64];
sprintf(&sz_exec_cmd, "cat /proc/%d/maps\0", getpid());
int dummy;
register void *rbp asm("rbp");
register void *rsp asm("rsp");
printf("Stack frame is %p-%p\n", rsp, rbp);
printf("Output of /proc/%d/maps:\n", getpid());
system(sz_exec_cmd);
printf("\n");
return;
}
void runnable_func(void* arg) {
printf("In child thread: before malloc\n");
print_map();
void *addr = malloc(1024);
printf("In child thread: after malloc returned %p\n", addr);
print_map();
free(addr);
return;
}
int main() {
pthread_t th;
void* s;
printf("Process started with pid %d\n",getpid());
void *addr = malloc(1024);
printf("In parent thread: after malloc returned %p\n", addr);
print_map();
pthread_create(&th, NULL, &runnable_func, NULL);
pthread_join(th, &s);
free(addr);
return 0;
}
Compile and run it with the following command:
gcc -w -o malloc_pthread_demo -pthread malloc_pthread_demo.c && ./malloc_pthread_demo
Observe the location of the stack and the heap in the parent thread:
Process started with pid 5331 In parent thread: after malloc returned 0x5555555596b0 Stack frame is 0x7fffffffe150-0x7fffffffe190 Output of /proc/5331/maps: 555555554000-555555555000 r--p 00000000 08:01 1967380 /home/kali/heaps/malloc_pthread_demo ... 555555559000-55555557a000 rw-p 00000000 00:00 0 [heap] 7ffff7dcd000-7ffff7dd0000 rw-p 00000000 00:00 0 ... 7ffff7ffd000-7ffff7ffe000 rw-p 00028000 08:01 131126 /usr/lib/x86_64-linux-gnu/ld-2.30.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
After we create and enter the child thread, a new thread-local stack space and arena is created. Observe the output after malloc
is invoked:
In child thread: malloc(1024) returned 0x7ffff0000b60 In child thread: malloc(140*1024) returned 0x7ffff75a8010 Stack frame is 0x7ffff7dcbe80-0x7ffff7dcbec0 Output of /proc/5368/maps: ... 555555559000-55555557a000 rw-p 00000000 00:00 0 [heap] (main arena) 7ffff0000000-7ffff0021000 rw-p 00000000 00:00 0 (thread arena) 7ffff0021000-7ffff4000000 ---p 00000000 00:00 0 7ffff75a8000-7ffff75cc000 rw-p 00000000 00:00 0 (thread mmap region) 7ffff75cc000-7ffff75cd000 ---p 00000000 00:00 0 (main mmap region) 7ffff75cd000-7ffff7dd0000 rw-p 00000000 00:00 0 (thread stack) 7ffff7dd0000-7ffff7df5000 r--p 00000000 08:01 131130 /usr/lib/x86_64-linux-gnu/libc-2.30.so
Note here that the main arena is mapped to the heap segment of the process, and the per-thread arenas are mapped to the memory-mapping segment of the process.
Multiple Arenas
When more thread collisions on arena critical sections happen (in multi-thread applications), additional arenas are created via mmap
.
As it turns out, there is an upper limit to the number of arenas per process. Having per-thread arenas for all threads seems to be rather expensive (some applications use hundreds if not thousands of threads!) This behavior is defined in malloc.c
in the macro NARENAS_FROM_CORES
:
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))
where [latex]n[/latex] is the number of processor cores. So for 32-bit applications we are allowed [latex]2n[/latex] arenas, and [latex]8n[/latex] arenas for 64-bit applications – if no cores are detected, the code assumes [latex]n=2[/latex]. You can change this limit by setting the MALLOC_ARENA_MAX
environment variable to the number of arenas you want.
When contention happens (number of threads exceeds the number of arenas) on the first thread invocation of malloc
, at first the arenas are reused by looping over the available arenas and attempting to lock them. If an arena is locked successfully, it is returned to the application thread; otherwise, it is placed in a blocking wait for the next arena. Once an arena has been reused by a thread, malloc
will attempt to reuse the same arena on its second thread invocation, placing the thread on blocking wait until the same arena is free.
The implication of this design for a heavily threaded application like the Apache Web Server is that there will be some thread contention, with a trade-off of lesser memory fragmentation.
Arena and Heap Organization
Each arena has a mutex which is used to control access to that arena. All operations (except for fastbin
access, more on fastbins
) require that the thread acquire a lock on the arena. Multiple thread contention on this mutex is the reason why multiple arenas are created, since threads assigned to different arenas need not wait for one another. Threads will automatically switch to unused (unlocked) arenas if contention requires it.
Each arena obtains memory from one or more heaps. The main arena uses the program’s initial heap, while additional arenas allocate memory for their heaps using mmap
, adding more heaps to their list of heaps as older heaps are used up. Each arena keeps track of a special “top
” chunk, which is typically the biggest available chunk, and also maintains a pointer to the most recently allocated heap.
The heap metadata is organized with the help of 3 data structures as shown in the two figures above:
malloc_chunk
structure (Chunk Header): The heap is sliced into many chunks based on the malloc calls by the user. The chunk is the smallest unit allocated by malloc, and each chunk has its ownmalloc_chunk
header structure.malloc_state
structure (Arena Header): A single-threaded arena may have multiple heaps (due to thread contention) but those heaps are governed by a singlemalloc_state
header structure. This structure tells the allocator where the top chunk (chunk at the highest address), last remainder chunk, bins, etc. are.heap_info
structure: Every heap has its own header, which has a pointer to the previousheap_info
structure. When multiple heaps are mapped to the same arena (e.g. when the heap segment runs out of space), this structure maintains information about the multiple heaps.
(More information about these structures in the later sections)
The main arena does not have a heap_info
structure, since only 1 heap exists. If it runs out of space, brk
/sbrk
is invoked to extend the end of the section until the memory-mapping segment is reached. The malloc_state
structure is a global variable stored in the mapped area of the shared library libc.so
.
When new chunks are allocated, top
(the contiguous free memory chunk available for allocation), well, always stays on top, and is referred to by the malloc_state
struct in the arena. Anything past top (in the higher addresses) is called “the wilderness”. top
can grow or shrink, depending on how much space is needed in the arena. Initially the size of top
is 0, forcing malloc
to allocate space to top
when an application requests space.
malloc_state
The malloc_state
is a per-arena struct. (Note that the main arena malloc_state is a global variable and not part of the heap segment.) Arena headers or malloc_state
structs are stored in the heap segment. As seen earlier, arenas can have memory from multiple heaps.
struct malloc_state
{
__libc_lock_define (, mutex);
int flags;
// Fast bins
mfastbinptr fastbinsY[NFASTBINS];
// Topmost free chunk
mchunkptr top;
// Pointer to a malloc_chunk structure from
// from a recent chunk split
mchunkptr last_remainder;
// Non-fastbins
mchunkptr bins[NBINS * 2 - 2];
// Bitmap of bins
unsigned int binmap[BINMAPSIZE];
// Linked list for arenas
struct malloc_state *next;
// Linked list for free arenas
struct malloc_state *next_free;
// Number of threads attached to this arena
// 0 if the arena is on the free list.
INTERNAL_SIZE_T attached_threads;
/* Memory allocated from the system in this arena. */
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};
typedef struct malloc_state *mstate;
Bins
Within each arena, chunks are either in-use by the application or free (available for allocation).
- In-use chunks are not tracked by the arena.
- Free chunks are stored in various freelists based on size and history, so that the allocator can find suitable chunks to satisfy allocation requests quickly.
The freelists are actually called bins. Bins are in-memory linked structures that keep track of all the freed chunks:
- Fast bins (
fastbinsY[]
inmalloc_state
): There are 10 fastbins as defined byNFASTBINS
but only the first 7 are used by default, servicing sizes up to128 B
inclusive (160 B
including chunk metadata, as defined byMAX_FAST_SIZE
). Each fast bin holds chunks with a specific size, where [latex]n[/latex]-th fastbin is a linked list of chunks with sizes in the range [latex](32n, 32n+32][/latex] in bytes (sofastbinsY[0]
holds chunk sizes1 B
–32 B
,fastbinsY[1]
holds chunk sizes33 B
–64 B
, …,fastbinsY[6]
holds chunk sizes97 B
–128 B
). Chunks added to a fast bin are not combined with adjacent chunks to keep access fast (hence fast bins). Chunks in the fastbins may be moved to other bins as needed. Fastbin chunks are stored in a single linked list, since they have identical sizes and chunks in the middle of the list don’t have to be accessed. - Normal bins (
bins[]
inmalloc_state
): The normal/regular bins are divided into small bins (where all chunks in a bin have identical sizes), and large bins (where chunks have varied sizes). When a chunk is added to these bins, they are combined with adjacent chunks to coalesce them into larger contiguous chunks. Hence, these chunks are usually never adjacent to other such normal chunks (although they may be adjacent to fast, unsorted or in-use chunks). These bins are doubly-linked lists so that chunks may be removed from any position (for example, when coalescing chunks).- Unsorted bin (
bins[1]
): When chunks are freed they are initially stored in the unsorted bin. The allocator sorts them later in order to give them a chance at being quickly re-used. The unsorted bin is the first bin of the normal bins. Note thatbins[0]
is unused. - Small bins (
bins[2:64]
): 62 small bins (index 2-63) hold small chunks of identical size, where the bin of index [latex]n[/latex] holds chunks in the range [latex][16n,16n+16)[/latex] bytes (sobins[2]
has chunk sizes32 B
–47 B
,bins[3]
has chunk sizes48 B
–63 B
, …,bins[63]
has chunk sizes1008 B
–1023 B
). When allocating a small chunk, the head pointer for the corresponding bin is used. The maximum is defined by the calculated constantMIN_LARGE_SIZE
which is1024 B
on a 64-bit system. - Large bins (
bins[64:127]
): 63 large bins (index 64-126) hold chunks between1024 B
and128 KiB
inclusive (or whatever valueM_MMAP_THRESHOLD
is set to). Each bin (64-126) holds a different range of sizes, therefore they have to be sorted (in decreasing order) and coalesced if necessary. When allocating a large chunk, the best chunk size has to be searched for and possibly split into two chunks (one with the requested size, and one for the remainder).
- Unsorted bin (
The large bins are divided into 6 groups, where each group contains bins of different chunk sizes (each bin has some chunk capacity increment over the previous bin):
Group | # Bins | Increment between bins |
1 | 32 | 64 B |
2 | 16 | 512 B |
3 | 8 | 4096 B (4 KiB ) |
4 | 4 | 32768 B (32 KiB ) |
5 | 2 | 262144 B (256 KiB ) |
6 | 1 | – |
When large chunks are freed, they are stored by size, in their respective bins. Smaller chunks are stored in small bins that hold freed chunks of the same size, sorted by when they were freed.
malloc_chunk
As defined in malloc-internal.h
, the malloc_chunk
struct contains metadata about a chunk.
struct malloc_chunk {
// If free, holds size of previous chunk
INTERNAL_SIZE_T mchunk_prev_size;
// Holds size of chunk, including data and overhead
INTERNAL_SIZE_T mchunk_size;
// Forward pointer used in normal bins
struct malloc_chunk* fd;
// Backward pointer used in normal bins
struct malloc_chunk* bk;
// Forward pointer to next larger free chunk in large bins
struct malloc_chunk* fd_nextsize;
// Backward pointer to previous smaller free chunk in large bins
struct malloc_chunk* bk_nextsize;
};
typedef struct malloc_chunk* mchunkptr;
The fd
, fd_nextsize
, bk
, bk_nextsize
pointers are only used when the chunk is free. Note that the names fd_nextsize
and bk_nextsize
are malloc_chunk
pointers and not size values (misleading, I know) – they are pointers to the next larger free chunk and the previous smaller free chunk, respectively. These pointers exist because of the need to find the best fit for large chunks, where chunks are sorted by size in descending order. This enables the allocator to quickly search for the first chunk that is big enough to satisfy some request (this is known as glibc
malloc
‘s first-fit behavior). However there is a small quirk: if the first two chunks are of the same size, first-fit usually returns the second chunk (to avoid adjusting the head pointer of the linked list). For the same reason, chunks are inserted after a chunk of the same size.
Allocated Chunk
The last 3 bits of the mchunk_size
field is reserved for flag information:
PREV_INUSE
(0x1
): Set when the previous chunk is allocated.IS_MMAPPED
(0x2
): Set when the chunk is mapped usingmmap
.NON_MAIN_ARENA
(0x4
): Set when the chunk is part of a non-main arena.
Notice that the pointers fd
and bk
are unused here. Also, the mchunk_size
field is also never returned directly to the application; Instead, the macros chunk2mem
and mem2chunk
are used to convert malloc
internal headers to user pointers and vice versa.
#define chunk2mem(p) ((void*)((char*)(p) + 2*SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
Free Chunk
Consecutive free chunks are combined into one contiguous free chunk. Hence, no consecutive free chunks exist in a bin list, and the mchunk_prev_size
field is always filled with data from the previous allocated chunk. (Except for fast bin chunks which are treated as allocated chunks, which do not consolidate neighboring free chunks).
The pointer to the location of fd
will be returned to the caller in the application.
Thread-Local Caching
Every thread has a thread-local variable that saves the last-used arena. Recall that when a thread needs to use the same arena last used, and that arena happens to be in use, the thread goes into blocking wait until the arena is free. If a thread has never used an arena before, then it does one of reusing an unused arena, creating a new arena, or picking the next available arena.
To reduce arena contention, every thread has a per-thread cache called the tcache
, which was introduced in glibc>=2.26
.
typedef struct tcache_entry
{
struct tcache_entry *next;
} tcache_entry;
typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
static __thread char tcache_shutting_down = 0;
static __thread tcache_perthread_struct *tcache = NULL;
The tcache
contains a small linked list of chunks which can be accessed without needing to lock an arena. There are 64 tcache
bins per thread by default, and chunk sizes range from 24 to 1032 bytes in 16 byte increments on a 64-bit system. tcache
maintains pointers to the application data area (payload) via tcache->entries
. Every tcache
bin contains chunks of identical size. Similar to fast bins, tcache->entries
is a LIFO structure.
The number of chunks allowed in each tcache
bin is constrained by TCACHE_COUNT
. If there is a suitable chunk of exact requested size in tcache->entries, it is returned to the caller. This takes precedence over fast bins.
If the tcache
bin is empty for a given requested size, the tcache
is not used and it falls back on normal malloc
routines (blocking wait, locking an arena). This is to reduce internal fragmentation.
Summary
Now we know the main structures used in malloc()
and free()
. Putting it all together, we have the allocation and free algorithms that are detailed in the subsections that follow.
Allocation Algorithm
When malloc(p)
is invoked for some pointer p
pointing to a space of size x
, the overall process looks like this:
- If there is a free chunk in the
tcache
with size exactlyx
, it is returned to the caller. - If
x > M_MMAP_THRESHOLD
,mmap()
is invoked to allocate memory. - If there is a chunk in the fast bin of size
>= x
, it is returned to the caller. If there is more than one suitable chunk available, thetcache
is also pre-filled. - If there is a chunk in the small bin of size
>= x
, it is returned to the caller. If there is more than one suitable chunk available, thetcache
is also pre-filled. - If the request is large, the fast bin chunks are coalesced and moved to the unsorted bin.
- The unsorted bin is sorted and coalesced, then moved to the appropriate small/large bins. If there is a suitable chunk of size
>= x
, it is returned to the caller. - If the request is large, the large bins are searched (starting from the smallest suitable large bin for the request) and the first chunk of size
>= x
is returned to the caller. - If there are remaining chunks in the fast bins, they are consolidated and steps 6-7 are repeated.
top
is aligned as necessary via splitting and/or enlarging.
Free Algorithm
When free()
is invoked, a chunk of memory is marked as “free to be reused” by the application, but is still reserved by the OS for the application. However, if the top
chunk becomes large enough, some of that memory may be un-mapped and returned to the OS. The process for free(p)
where p
points to memory of size x
looks like this:
- If there is space in the
tcache
, the chunk is stored there and the function returns. - If
x
is small enough to be in a fast bin, it is placed in the appropriate fast bin. - If the chunk was allocated via
mmap()
,munmap()
is invoked to de-allocate it. - If
p
is next to another free chunk, both chunks are consolidated into a contiguous chunk (which may be thetop
chunk) - The new consolidated chunk is placed in the unsorted bin (skipped if the chunk is now
top
). - During sometime in the lifecycle of the allocator:
- If
x
is large enough, the fast bin list is consolidated. - If the top chunk is large enough, some memory is split off and returned to the OS.
- If
Main Concepts
To conclude, let’s cover the main concepts in glibc
‘s memory allocator:
- Every allocation is a memory chunk which is aligned to
16 B
as defined byMALLOC_ALIGNMENT
(which is the largest of the size of along double
(128 bits) and the size ofsize_t
(64 bits)). - Every allocation contains metadata as well as the requested region of memory.
- There is a main arena in the program heap segment whose
malloc_state
struct is a global variable inlibc.so
. - Arenas keep track of free chunks. Multiple threads can use the same arena through blocking-wait and locking via the mutex in
malloc_state
, and arenas are created as they are needed. - The maximum number of arenas is [latex]8n[/latex] by default on a 64-bit system, where [latex]n[/latex] is the number of cores.
- Each arena can have multiple heaps. Here, heaps refer to spaces allocated via
mmap()
calls. - Each arena maintains a single
malloc_state
structure, which keeps track of thetop
chunk. Themalloc_state
structure also defines 10 fast bins and 126 regular bins by default. - Out of the 10 fast bins defined, 7 fast bins by default (which are singly-linked lists of free chunks) service chunks up to
128 B
. - The regular bins are split up into unsorted, small and large bins.
- 62 small bins indexed 2-63 hold chunks equal to or above
32 B
and below1024 B
. - 63 large bins indexed 64-126 hold chunks above
1024 B
inclusive, up to128 KiB
orM_MMAP_THRESHOLD
. - The unsorted bin (index 1) holds freed chunks.
- Index 0 is unused.
- 62 small bins indexed 2-63 hold chunks equal to or above
- Each heap in an arena has its own
heap_info
structure. Multipleheap_info
structures form a linked list via theprev
field, and thear_ptr
of the topheap_info
structure points to themalloc_state
structure of the arena. - The number of arenas is constrained by
MALLOC_ARENA_MAX
. - The area past the
top
chunk is called the wilderness.
Memory allocation in modern applications is complex, and hopefully this article distills the main components of glibc
‘s memory allocator down into a more approachable form.