Malloc internals: chunks

Aug 30, 2022

Since last year I’m involved in infosec, and I co-founded the havce CTF team with some of the colleagues that attended the CyberChallenge.IT course in 2021.

I am a big fan of pwn challenges and binary exploitation in general, so after dealing with standard buffer overflows on the stack and the various format string vulnerabilities, I wanted to step up my skills and learn some heap exploitation.

I started by using the awesome HeapLAB by Max Kamper. If you’re serious about heap exploitation, this course is gold, go and buy it. Max Kamper is also the author of the famous ROP emporium, super useful to learn return oriented programming.

In this post I will cover some basics about malloc internal structures to help me remember and understand better the attacks.

What is `malloc`?

malloc is the name of one of the functions that the standard C library provides to manually manage dynamic memory allocations. It is also the name given to the glibc memory allocator.

Chunks

chunks are the fundamental unit of work in malloc. They usually are part of the heap, but they can be allocated as separate entities with a call to mmap.

This is what a in-use chunk looks like.

                        ┌──────┐
malloc ptr->            │ size │
            ┌───────────┴──────┤
programptr->│     user_data    │
            │           ┌──────┘
            │           │
            └───────────┘

While programs usually deal with the programptr (a pointer to user data), malloc usually deals with the malloc ptr (tcache is an exception). The malloc ptr begins 8 bytes before the size field.

The size field of the chunk indicates the length of the user data plus the size field length. Since all chunks are multiples of 8 bytes, the 3 least significant bits (LSBs) of the chunk size field can be used for flags.

These flags are (listed from least significant to most significant):

PREV_INUSE: when set, the previous chunk is still being used by the application, when cleared the previous chunk is free’d.
IS_MMAPPED, when set, this chunk has been allocated with mmap and is not part of a heap at all.
NON_MAIN_ARENA: when set, it indicates that the chunk does not belong to the main arena.

When a chunk is free, up to 5 quadwords of its user data can be repurposed as malloc metadata and they can become part of the succeeding chunk.

/*
              ┌─────────────┐
  prev_size   │    size     │
┌─────────────┼─────────────┤
│ fd          │ bk          │
├─────────────┼─────────────┤
│ fd_nextsize │ bk_nextsize │
├─────────────┼─────────────┤
│             │             │
├─────────────┼─────────────┘
│             │             
└─────────────┘
*/

struct malloc_chunk {
 INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
 INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */

 struct malloc_chunk* fd;         /* double links -- used only if free. */
 struct malloc_chunk* bk;

 /* Only used for large blocks: pointer to next larger size.  */
 struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
 struct malloc_chunk* bk_nextsize;
};

When a chunk is free, the same memory that was used by the application data before gets repurposed to store malloc internal metadata.

This is probably the main reason why the glibc memory allocator is such a yummy target to exploit designers, as it’s really easy to tamper with the malloc metadata and use it as an exploit vector.

When a chunk is free, its shape can change. Like we said, some fields can be repurposed, but some memory can actually change ownership, and migrate to another chunk.

In the following lines we’re going to mention the malloc bins, which are some lists used by malloc to keep track of the free’d chunks.

There are many of them and they serve different purposes, but they are basically linked lists of free chunks. Some of them are singly linked lists and other are doubly linked lists. We will cover them in detail in future articles.

When a chunk is free’d, the first quadword of a chunk’s user data gets repurposed as a forward pointer (fd).

When the chunk gets linked in a bin that behaves like a doubly linked list (e.g. the unsortedbin), the second quadword gets repurposed as a backwards pointer (bk).

The third and fourth quadwords are repurposed as fd_nextsize and bk_nextsize in largebins only.

In bins that support consolidation, the last quadword of a free chunk gets repurposed as the prev_size field. This indicates the size of the free chunk, exactly like the size field does, but without the flags.

malloc considers the prev_size field as part of the succeeding chunk (that’s why I said chunks can change shape) and when it is set, the PREV_INUSE flag of the succeeding chunk is also set.

On GLIBC versions >= 2.29, the second quadword of a free chunk linked in a tcachebin gets repurposed as a key field, used to detect double-frees.

What is malloc?

Chunks

Further reading

What is `malloc`?