Revet: A Language and Compiler for Dataflow Threads
Alexander Rucker, Shiv Sundram, Coleman Smith, Matthew Vilim, Raghu Prabhakar, Fredrik Kjolstad, Kunle Olukotun
TL;DR
Revet tackles the challenge of running threaded, control-flow-rich applications on vectorized reconfigurable dataflow accelerators (vRDAs) by introducing a full stack: a programming language with explicit threaded parallelism, an MLIR-based compiler, and a streaming dataflow execution model. Central to Revet is the Structured-Link Tensor Format (SLTF) and a set of streaming primitives that encode control decisions as data, enabling hierarchical barriers and correct composition of nested loops. The work provides a concrete vRDA abstract machine, a compiler pipeline with front-end lowering, memory- and control-flow optimizations, and a CFG-to-dataflow lowering stage, validated on a variety of data-analytic and traversal workloads. Results show Revet consistently outperforms a state-of-the-art GPU (V100) by a geomean of $3.8\times$ on a $4.3\times$ smaller vRDA, with area-adjusted speedups around $16\times$, demonstrating the practicality and potential of thread-based programming for dataflow accelerators.
Abstract
Spatial dataflow architectures such as reconfigurable dataflow accelerators (RDA) can provide much higher performance and efficiency than CPUs and GPUs. In particular, vectorized reconfigurable dataflow accelerators (vRDA) in recent literature represent a design point that enhances the efficiency of dataflow architectures with vectorization. Today, vRDAs can be exploited using either hardcoded kernels or MapReduce languages like Spatial, which cannot vectorize data-dependent control flow. In contrast, CPUs and GPUs can be programmed using general-purpose threaded abstractions. The ideal combination would be the generality of a threaded programming model coupled with the efficient execution model of a vRDA. We introduce Revet: a programming model, compiler, and execution model that lets threaded applications run efficiently on vRDAs. The Revet programming language uses threads to support a broader range of applications than Spatial's parallel patterns, and our MLIR-based compiler lowers this language to a generic dataflow backend that operates on streaming tensors. Finally, we show that mapping threads to dataflow outperforms GPUs, the current state-of-the-art for threaded accelerators, by 3.8x.
