Table of Contents
Fetching ...

GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production

Kostya Serebryany, Chris Kennelly, Mitch Phillips, Matt Denton, Marco Elver, Alexander Potapenko, Matt Morehouse, Vlad Tsyrklevich, Christian Holler, Julian Lettner, David Kilzer, Lander Brandt

TL;DR

GWP-ASan tackles heap-use-after-free and heap-buffer-overflow bugs by introducing a low-overhead, sampling-based memory-safety detector that operates in production without requiring binary changes. The approach builds on page-protection via guarded allocations and uses sparse sampling to achieve near-zero overhead while producing rich error reports with allocation and deallocation stack traces. The paper presents a simple version and multiple platform-specific implementations (TCMalloc, Chrome, Android/LLVM, Firefox PHC, Apple PGM, Linux KFENCE) and reports deployment outcomes across Google, Android, ChromeOS, Firefox, Apple, and Meta, demonstrating substantial bug discovery with high actionable-value reports. Collectively, GWP-ASan demonstrates that production-time bug detection can scale to modern software ecosystems, delivering meaningful improvements in reliability and security without compromising performance. It remains complementary to pre-production tools and serves as a practical, near-term mechanism for identifying memory-safety issues in vast, legacy C/C++ codebases.

Abstract

Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms. This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an "if" statement to a 36-year-old idea and made it work at scale. We describe the basic algorithm, several of its variants and implementations, and the results of multi-year deployments across mobile, desktop, and server applications.

GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production

TL;DR

GWP-ASan tackles heap-use-after-free and heap-buffer-overflow bugs by introducing a low-overhead, sampling-based memory-safety detector that operates in production without requiring binary changes. The approach builds on page-protection via guarded allocations and uses sparse sampling to achieve near-zero overhead while producing rich error reports with allocation and deallocation stack traces. The paper presents a simple version and multiple platform-specific implementations (TCMalloc, Chrome, Android/LLVM, Firefox PHC, Apple PGM, Linux KFENCE) and reports deployment outcomes across Google, Android, ChromeOS, Firefox, Apple, and Meta, demonstrating substantial bug discovery with high actionable-value reports. Collectively, GWP-ASan demonstrates that production-time bug detection can scale to modern software ecosystems, delivering meaningful improvements in reliability and security without compromising performance. It remains complementary to pre-production tools and serves as a practical, near-term mechanism for identifying memory-safety issues in vast, legacy C/C++ codebases.

Abstract

Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms. This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an "if" statement to a 36-year-old idea and made it work at scale. We describe the basic algorithm, several of its variants and implementations, and the results of multi-year deployments across mobile, desktop, and server applications.
Paper Structure (25 sections, 3 figures)

This paper contains 25 sections, 3 figures.

Figures (3)

  • Figure 1: GWP-ASan pool initial memory state. Guard pages, viz. "red zones", are shown as (red hatch pattern), and remain always inaccessible. Allocation slots are shown as (black filled) when inaccessible.
  • Figure 2: GWP-ASan pool memory state after an allocation. The allocation slot shown as (green dots) is allocated and accessible.
  • Figure 3: Bug occurrences across Google server-side applications, Android, and Chrome.