Compiling Away the Overhead of Race Detection
Alexey Paznikov, Andrey Kogutenko, Yaroslav Osipov, Michael Schwarz, Umang Mathur
TL;DR
Dynamic data race detectors are essential for catching concurrency errors but suffer from high runtime overhead due to instrumentation. The authors present a compiler-integrated static framework within LLVM that uses five interprocedural analyses to prune instrumentation, targeting provably race-free memory accesses and reducing redundant checks while preserving the detector's soundness and completeness. They introduce a dominance-based elimination analysis and an equivalence-class concept to further prune checks, and implement these analyses in ThreadSanitizer's LLVM pass. Empirical results on real-world applications show a geomean overhead reduction of 1.34x, peaking at 2.5x under heavy contention, with negligible compile-time impact and fully automatic integration that is being upstreamed.
Abstract
Dynamic data race detectors are indispensable for flagging concurrency errors in software, but their high runtime overhead limits their adoption. This overhead stems primarily from pervasive instrumentation of memory accesses - a significant fraction of which is redundant. We addresses this inefficiency through a static, compiler-integrated approach that identifies and eliminates redundant instrumentation, drastically reducing the runtime cost of dynamic data race detectors. We introduce a suite of interprocedural static analyses reasoning about memory access patterns, synchronization, and thread creation to eliminate instrumentation for provably race-free accesses and show that the completeness properties of the data race detector are preserved. We further observe that many inserted checks flag a race if and only if a preceding check has already flagged an equivalent race for the same memory location - albeit potentially at a different access. We characterize this notion of equivalence and show that, when limiting reporting to at least one representative for each equivalence class, a further class of redundant checks can be eliminated. We identify such accesses using a novel dominance-based elimination analysis. Based on these two insights, we have implemented five static analyses within the LLVM, integrated with the instrumentation pass of the race detector ThreadSanitizer. Our experimental evaluation on a diverse suite of real-world applications demonstrates that our approach significantly reduces race detection overhead, achieving a geomean speedup of 1.34x, with peak speedups reaching 2.5x under high thread contention. This performance is achieved with a negligible increase in compilation time and, being fully automatic, places no additional burden on developers. Our optimizations have been accepted by the ThreadSanitizer maintainers and are in the process of being upstreamed.
