SeMalloc: Semantics-Informed Memory Allocator

Ruizhe Wang; Meng Xu; N. Asokan

SeMalloc: Semantics-Informed Memory Allocator

Ruizhe Wang, Meng Xu, N. Asokan

TL;DR

SeMalloc tackles the pervasive use-after-free (UAF) problem by introducing SemaType, a semantics-informed, thread-, context-, and flow-sensitive type that constrains heap object reuse to objects sharing both allocation site and call context. The system combines a runtime LLVM instrumentation pass with a backend allocator that partitions memory by SemaType using a BIBOP-style layout, with an encoding that fits SemaType into existing allocation interfaces. Empirical results show SeMalloc effectively thwarts real-world UAF exploits while delivering competitive run-time performance and memory overhead (average run-time near baseline; memory overhead around $41$–$84\%$), outperforming several prior approaches in balancing security and efficiency. This work demonstrates a practical defense-in-depth approach: by carefully calibrating type sensitivity, SeMalloc achieves meaningful security gains without prohibitive overhead and suggests broad applicability to other performance-driven allocators and fault-isolation schemes.

Abstract

Use-after-free (UAF) is a critical and prevalent problem in memory unsafe languages. While many solutions have been proposed, balancing security, run-time cost, and memory overhead (an impossible trinity) is hard. In this paper, we show one way to balance the trinity by passing more semantics about the heap object to the allocator for it to make informed allocation decisions. More specifically, we propose a new notion of thread-, context-, and flow-sensitive "type", SemaType, to capture the semantics and prototype a SemaType-based allocator that aims for the best trade-off amongst the impossible trinity. In SeMalloc, only heap objects allocated from the same call site and via the same function call stack can possibly share a virtual memory address, which effectively stops type-confusion attacks and makes UAF vulnerabilities harder to exploit. Through extensive empirical evaluation, we show that SeMalloc is realistic: (a) SeMalloc is effective in thwarting all real-world vulnerabilities we tested; (b) benchmark programs run even slightly faster with SeMalloc than the default heap allocator, at a memory overhead averaged from 41% to 84%; and (c) SeMalloc balances security and overhead strictly better than other closely related works.

SeMalloc: Semantics-Informed Memory Allocator

TL;DR

–

), outperforming several prior approaches in balancing security and efficiency. This work demonstrates a practical defense-in-depth approach: by carefully calibrating type sensitivity, SeMalloc achieves meaningful security gains without prohibitive overhead and suggests broad applicability to other performance-driven allocators and fault-isolation schemes.

Abstract

Paper Structure (39 sections, 11 figures, 10 tables, 2 algorithms)

This paper contains 39 sections, 11 figures, 10 tables, 2 algorithms.

Introduction
A Mini SoK on UAF
Exploiting UAF Vulnerabilities
Mitigating UAF Vulnerabilities
A Reflection on Semantics And Type
Capture Semantics with SemaType
Defining SemaType
Cyclic Control-flow Structures
SemaType Representation
Alternative: Path-sensitivity
SemaType-based Heap Allocation
Overview
Call Graph Construction
Edge Weight Assignment
SCC Stack Pointers Aggregation
...and 24 more sections

Figures (11)

Figure 1: A hypothetical example to illustrate UAF exploits. Exploit-B: line 16--5--7--17 $\quad\;\rightarrow$ arbitrary code execution Exploit-C: line 16--5--6--8--20 $\,\rightarrow$ information leak Exploit-D: line 16--5--19--9 $\quad\;\rightarrow$ arbitrary code execution Exploit-E: line 16--5--21 $\quad\quad\;\rightarrow$$p$ is de-allocated and dangling
Figure 2: A hypothetical example to illustrate UAF exploits against objects of the same type.
Figure 3: Call graph (left) of a crafted program \ref{['app:code_example']} illustrating how SemaType (right) can be deduced. In this call graph, each node is a function and solid edges represent function calls not in a loop inside the corresponding function CFG while dashed edges represent function calls inside a loop.
Figure 4: Design overview of SeMalloc (: flags, : SemaType, : allocation size). The size is the parameter without SeMalloc, while SeMalloc encodes the trace information into the parameter after applying the pass.
Figure 5: Parameter encoding rule for regular objects(L: loop identifier; H: huge block identifier).
...and 6 more figures

SeMalloc: Semantics-Informed Memory Allocator

TL;DR

Abstract

SeMalloc: Semantics-Informed Memory Allocator

Authors

TL;DR

Abstract

Table of Contents

Figures (11)