Table of Contents
Fetching ...

WaveCert: Translation Validation for Asynchronous Dataflow Programs via Dynamic Fractional Permissions

Zhengyao Lin, Joshua Gancher, Bryan Parno

TL;DR

This work tackles correctness of translating imperative programs to asynchronous dataflow graphs on CGRAs, where schedule nondeterminism can alter behavior. It introduces WaveCert, a two-phase translation validation framework that first proves equivalence on a canonical dataflow schedule via cut-simulation, then establishes confluence using dynamic linear permission tokens to ensure all schedules converge to the canonical one. The approach is implemented for the RipTide compiler, uncovering 8 compiler bugs (including a data-race issue) and verifying the RipTide benchmark suite. The method yields strong correctness and liveness guarantees, while remaining practical by relying on unverified toolchains and SMT-based reasoning, and it points to future enhancements for optimized dataflow graphs and hardware-specific verification.

Abstract

Coarse-grained reconfigurable arrays (CGRAs) have gained attention in recent years due to their promising power efficiency compared to traditional von Neumann architectures. To program these architectures using ordinary languages such as C, a dataflow compiler must transform the original sequential, imperative program into an equivalent dataflow graph, composed of dataflow operators running in parallel. This transformation is challenging since the asynchronous nature of dataflow graphs allows out-of-order execution of operators, leading to behaviors not present in the original imperative programs. We address this challenge by developing a translation validation technique for dataflow compilers to ensure that the dataflow program has the same behavior as the original imperative program on all possible inputs and schedules of execution. We apply this method to a state-of-the-art dataflow compiler targeting the RipTide CGRA architecture. Our tool uncovers 8 compiler bugs where the compiler outputs incorrect dataflow graphs, including a data race that is otherwise hard to discover via testing. After repairing these bugs, our tool verifies the correct compilation of all programs in the RipTide benchmark suite.

WaveCert: Translation Validation for Asynchronous Dataflow Programs via Dynamic Fractional Permissions

TL;DR

This work tackles correctness of translating imperative programs to asynchronous dataflow graphs on CGRAs, where schedule nondeterminism can alter behavior. It introduces WaveCert, a two-phase translation validation framework that first proves equivalence on a canonical dataflow schedule via cut-simulation, then establishes confluence using dynamic linear permission tokens to ensure all schedules converge to the canonical one. The approach is implemented for the RipTide compiler, uncovering 8 compiler bugs (including a data-race issue) and verifying the RipTide benchmark suite. The method yields strong correctness and liveness guarantees, while remaining practical by relying on unverified toolchains and SMT-based reasoning, and it points to future enhancements for optimized dataflow graphs and hardware-specific verification.

Abstract

Coarse-grained reconfigurable arrays (CGRAs) have gained attention in recent years due to their promising power efficiency compared to traditional von Neumann architectures. To program these architectures using ordinary languages such as C, a dataflow compiler must transform the original sequential, imperative program into an equivalent dataflow graph, composed of dataflow operators running in parallel. This transformation is challenging since the asynchronous nature of dataflow graphs allows out-of-order execution of operators, leading to behaviors not present in the original imperative programs. We address this challenge by developing a translation validation technique for dataflow compilers to ensure that the dataflow program has the same behavior as the original imperative program on all possible inputs and schedules of execution. We apply this method to a state-of-the-art dataflow compiler targeting the RipTide CGRA architecture. Our tool uncovers 8 compiler bugs where the compiler outputs incorrect dataflow graphs, including a data race that is otherwise hard to discover via testing. After repairing these bugs, our tool verifies the correct compilation of all programs in the RipTide benchmark suite.
Paper Structure (28 sections, 4 theorems, 8 equations, 7 figures, 1 algorithm)

This paper contains 28 sections, 4 theorems, 8 equations, 7 figures, 1 algorithm.

Key Result

Proposition 1

If $(\mathcal{I}_\mathrm{df}, \mathcal{C}_\mathrm{df}, \to_\mathrm{df})$ and $(\mathcal{I}_\mathrm{llvm}, \mathcal{C}_\mathrm{llvm}, \to_\mathrm{llvm})$ are equivalent on a canonical schedule (def:canonical-equivalence) and $(\mathcal{I}_\mathrm{df}, \mathcal{C}_\mathrm{df}, \to_\mathrm{df})$ is con

Figures (7)

  • Figure 1: Examples of operators. Carry and Steer manage control flow, Store and Load manipulate the main memory, and op$\in \{+, *, <, \dots\}$ executes pure arithmetic operations.
  • Figure 2: An example of input/output programs from the RipTide dataflow compiler. Most dataflow operators correspond to LLVM instructions: the comparison operator corresponds to line \ref{['line:example-icmp']}, the add operators correspond to lines \ref{['line:example-add-a-i']} and \ref{['line:example-add-i']}, while the load and store operators correspond to lines \ref{['line:example-load']} and \ref{['line:example-store']}, respectively.
  • Figure 3: This plot shows a trace of execution from dataflow cut point 2 (upper left corner). The operators fired in this trace are 1, 2, 4, 3, 5, 7, 8, 9, 6 (which are inferred from the LLVM trace from cut point 2 to cut point 2). At the end of the trace, the configuration matches cut point 2 again. In each configuration, we mark the fired operators with orange lightning symbols. The emptiness of each channel after firing the operator(s) is represented by dashed vs solid lines. Before the firing of an operator, the input channels of the operator should be non-empty and marked with a solid line. After firing, the input channels are emptied, and the modified output channels will become non-empty and marked with a solid, orange line. The symbolic values in each channel are omitted, but their attached permission variables (starting with p) are marked. At the top of each configuration, we also indicate the permission constraints added at each step.
  • Figure 4: Cut points for the example in \ref{['fig:example']}. For the second LLVM cut point, WaveCert places it at the program point at the back edge from %body to %header. The correspondence equality $\mathsf{Mem}_\mathrm{llvm} = \mathsf{Mem}_\mathrm{df}$ means that the memory state should be the same at each cut point. The equality $\texttt{\%i} = \texttt{i}$ means that the LLVM variable $\texttt{\%i}$ at cut point 2 should be equal to the dataflow variable i referring to the value in the channel from operator 6 to 1 in \ref{['fig:example-dataflow-cut-point-2-to-cut-point-2']}. We implicitly assume that all function parameters are equal (i.e. $\texttt{\%A} = \texttt{A} \wedge \texttt{\%B} = \texttt{B} \wedge \texttt{\%len} = \texttt{len}$).
  • Figure 5: Evaluation results on 21 test cases sorted by the number of dataflow operators. From left to right, the columns are the name of the test case (Name), lines of code in the original C program (LOC), number of dataflow operators (#OP), number of permission constraints (#PC), time spent for simulation check in seconds (Sim.), and time spent for confluence check in seconds (Conf.). All test cases are originally written in C and then compiled to LLVM via Clang clang. Compiler bugs cause the simulation checks to fail in SpSlice, sort, and sha256. In all other cases, the simulation and confluence checks succeed.
  • ...and 2 more figures

Theorems & Definitions (17)

  • Definition 1: Memory-synchronizing traces
  • Definition 2: Program equivalence
  • Definition 3: Canonical schedule
  • Definition 4: Program equivalence on a canonical schedule
  • Definition 5: Cut-simulation
  • Definition 6
  • Definition 7
  • Proposition 1
  • Definition 8: Permission Algebra
  • Definition 9
  • ...and 7 more