Weak Memory Demands Model-based Compiler Testing
Luke Geeson
TL;DR
The paper addresses the risk of compiler bugs when concurrency semantics observed on relaxed hardware diverge from the source language model. It advocates model-based testing with tooling parameterized over source and architecture models, exemplified by the Téléchat tool and its use with a known LLVM bug revealed through updates to the herd memory model. The case study demonstrates a bug caused by LLVM's code generation and a dead-register optimization that makes an Arm SWP-based path appear to violate the C11 model, highlighting a new class of Heisenbugs. The work argues that testing practices and test generators must adapt to hardware relaxations and that automated industry-scale testing can uncover such issues, motivating broader adoption of model-aware testing.
Abstract
A compiler bug arises if the behaviour of a compiled concurrent program, as allowed by its architecture memory model, is not a behaviour permitted by the source program under its source model. One might reasonably think that most compiler bugs have been found in the decade since the introduction of the C/C++ memory model. We observe that processor implementations are increasingly exploiting the behaviour of relaxed architecture models. As such, compiled programs may exhibit bugs not seen on older hardware. To account for this we require model-based compiler testing. While this observation is not surprising, its implications are broad. Compilers and their testing tools will need to be updated to follow hardware relaxations, concurrent test generators will need to be improved, and assumptions of prior work will need revisiting. We explore these ideas using a compiler toolchain bug we reported in LLVM.
