Compiler Testing With Relaxed Memory Models

Luke Geeson; Lee Smith

Compiler Testing With Relaxed Memory Models

Luke Geeson, Lee Smith

TL;DR

This work targets compiler correctness under relaxed memory models for concurrent C/C++ programs by introducing Téléchat, an automated, model-based testing tool. Téléchat uses the herd simulator to compare outcomes of source programs under source memory models with compiled programs under architecture models, detecting bugs when $outcomes(exec(comp(S),M_{C})) \not\subseteq outcomes(exec(S),M_{S})$. The approach is parameterised over multiple memory models, scalable through compilation-time optimisations, and designed for automated regression testing in industry—demonstrated by deployment within Arm and large-scale differential testing across LLVM and GCC. The evaluation shows Téléchat finds behaviours missed by prior work, reports new concurrency bugs and model issues, and highlights practical industry impact while also outlining limitations and future directions such as model completeness and state-explosion challenges.

Abstract

Finding bugs is key to the correctness of compilers in wide use today. If the behaviour of a compiled program, as allowed by its architecture memory model, is not a behaviour of the source program under its source model, then there is a bug. This holds for all programs, but we focus on concurrency bugs that occur only with two or more threads of execution. We focus on testing techniques that detect such bugs in C/C++ compilers. We seek a testing technique that automatically covers concurrency bugs up to fixed bounds on program sizes and that scales to find bugs in compiled programs with many lines of code. Otherwise, a testing technique can miss bugs. Unfortunately, the state-of-the-art techniques are yet to satisfy all of these properties. We present the Téléchat compiler testing tool for concurrent programs. Téléchat compiles a concurrent C/C++ program and compares source and compiled program behaviours using source and architecture memory models. We make three claims: Téléchat improves the state-of-the-art at finding bugs in code generation for multi-threaded execution, it is the first public description of a compiler testing tool for concurrency that is deployed in industry, and it is the first tool that takes a significant step towards the desired properties. We provide experimental evidence suggesting Téléchat finds bugs missed by other state-of-the-art techniques, case studies indicating that Téléchat satisfies the properties, and reports of our experience deploying Téléchat in industry regression testing.

Compiler Testing With Relaxed Memory Models

TL;DR

. The approach is parameterised over multiple memory models, scalable through compilation-time optimisations, and designed for automated regression testing in industry—demonstrated by deployment within Arm and large-scale differential testing across LLVM and GCC. The evaluation shows Téléchat finds behaviours missed by prior work, reports new concurrency bugs and model issues, and highlights practical industry impact while also outlining limitations and future directions such as model completeness and state-explosion challenges.

Abstract

Paper Structure (36 sections, 3 equations, 11 figures, 4 tables)

This paper contains 36 sections, 3 equations, 11 figures, 4 tables.

Introduction
Our Contributions
Téléchat Benefits
Preliminaries
Litmus Tests and Memory Models
Compiler Testing
State-of-the-art Techniques
Prose and Expertise
Semi-automatic Tools
Hardware-based Tools
Our Solution - Testing with Models
Design and Implementation of Téléchat
Technique Design
Tool Implementation
Adoption, Usage, and Documentation
...and 21 more sections

Figures (11)

Figure 1: Litmus tests have a fixed initial state, a concurrent program and a predicate over the final state. Outcomes that satisfy the exists clause are forbidden by the C/C++ model C11. When compiled the outcome is allowed by the Armv8 AArch64 aarch64 model. We found this bug swp using Téléchat.
Figure 2: Executions of Fig. \ref{['lb_rc11']}, a:R(Rlx)[x]=0 means event $a$ reads the value 0 from location $x$ with relaxed memory ordering. Top left is acbd or cabd, right abcd, bottom left cdab, right dabc (dabc is forbidden by RC11 simonrc11).
Figure 3: The outcomes of executions in Fig. \ref{['lb_exe']}. The acyclic constraint of the RC11 model simonrc11 forbids dabc and its outcome {P1:r0=0; y=2}.
Figure 4: Differential and Metamorphic Testing - §\ref{['testing']}.
Figure 5: Test environment $exec_{tv}$ including the Téléchat tool and tools we improved marked *. \ref{['eq:tv']} checks the outcomes of simulating $comp(S)$ under its architecture model $M_C$ against the source $S$ under source model $M_S$.
...and 6 more figures

Theorems & Definitions (3)

Definition 2.1
Definition 2.2
Definition 2.3

Compiler Testing With Relaxed Memory Models

TL;DR

Abstract

Compiler Testing With Relaxed Memory Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (3)