GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production
Kostya Serebryany, Chris Kennelly, Mitch Phillips, Matt Denton, Marco Elver, Alexander Potapenko, Matt Morehouse, Vlad Tsyrklevich, Christian Holler, Julian Lettner, David Kilzer, Lander Brandt
TL;DR
GWP-ASan tackles heap-use-after-free and heap-buffer-overflow bugs by introducing a low-overhead, sampling-based memory-safety detector that operates in production without requiring binary changes. The approach builds on page-protection via guarded allocations and uses sparse sampling to achieve near-zero overhead while producing rich error reports with allocation and deallocation stack traces. The paper presents a simple version and multiple platform-specific implementations (TCMalloc, Chrome, Android/LLVM, Firefox PHC, Apple PGM, Linux KFENCE) and reports deployment outcomes across Google, Android, ChromeOS, Firefox, Apple, and Meta, demonstrating substantial bug discovery with high actionable-value reports. Collectively, GWP-ASan demonstrates that production-time bug detection can scale to modern software ecosystems, delivering meaningful improvements in reliability and security without compromising performance. It remains complementary to pre-production tools and serves as a practical, near-term mechanism for identifying memory-safety issues in vast, legacy C/C++ codebases.
Abstract
Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms. This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an "if" statement to a 36-year-old idea and made it work at scale. We describe the basic algorithm, several of its variants and implementations, and the results of multi-year deployments across mobile, desktop, and server applications.
