HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs
John Jacobson, Martin Burtscher, Ganesh Gopalakrishnan
TL;DR
HiRace addresses GPU data races by instrumenting CUDA source code and performing dynamic analysis. It uses a fixed-size per-address shadow state encoded by a finite-state machine with $25$ states and $1200$ transitions, requiring only $5$ bits per address and about $8$ bytes total per shadow entry. The approach is validated against the Indigo benchmark suite and verified with the Murphi model checker to ensure correctness. Empirical results show HiRace detects more races than prior tools, with up to $30$-$50$x speedups and roughly half the memory overhead. This combination yields a practical, scalable GPU race detector that runs at source level and does not depend on compiler/hardware specifics, and it will be open-sourced.
Abstract
Data races are egregious parallel programming bugs on CPUs. They are even worse on GPUs due to the hierarchical thread and memory structure, which makes it possible to write code that is correctly synchronized within a thread group while not being correct across groups. Thus far, all major data-race checkers for GPUs suffer from at least one of the following problems: they do not check races in global memory, do not work on recent GPUs, scale poorly, have not been extensively tested, miss simple data races, or are not dependable without detailed knowledge of the compiler. Our new data-race detection tool, HiRace, overcomes these limitations. Its key novelty is an innovative parallel finite-state machine that condenses an arbitrarily long access history into a constant-length state, thus allowing it to handle large and long-running programs. HiRace is a dynamic tool that checks for thread-group shared memory and global device memory races. It utilizes source-code instrumentation, thus avoiding driver, compiler, and hardware dependencies. We evaluate it on a modern calibrated data-race benchmark suite. On the 580 tested CUDA kernels, 346 of which contain data races, HiRace finds races missed by other tools without false alarms and is more than 10 times faster on average than the current state of the art, while incurring only half the memory overhead.
