SPADA: A Spatial Dataflow Architecture Programming Language
Lukas Gianinazzi, Tal Ben-Nun, Torsten Hoefler
TL;DR
SPADA tackles the challenge of programming spatial dataflow architectures by providing a language with explicit data placement, dataflow patterns, and asynchronous execution, underpinned by a formal dataflow semantics and an automatic routing assignment mechanism. It delivers an end-to-end GT4Py-to-Cerebras CSL compilation pipeline, including a Stencil IR and multi-stage lowering, achieving substantial code-size reduction and strong performance on WSE-2. The approach is validated with hand-written SPADA kernels and GT4Py-stencil workloads, showing over 150 TFlop/s and near-ideal weak scaling across three orders of magnitude. By unifying programming under a single, principled model and offering automated routing, SPADA enables productive development for spatial architectures with competitive HPC performance.
Abstract
Spatial dataflow architectures like the Cerebras Wafer-Scale Engine achieve exceptional performance in AI and scientific applications by leveraging distributed memory across processing elements (PEs) and localized computation. However, programming these architectures remains challenging due to the need for explicit orchestration of data movement through reconfigurable networks-on-chip and asynchronous computation triggered by data arrival. Existing FPGA and CGRA programming models emphasize loop scheduling but overlook the unique capabilities of spatial dataflow architectures, particularly efficient dataflow over regular grids and intricate routing management. We present SPADA, a programming language that provides precise control over data placement, dataflow patterns, and asynchronous operations while abstracting architecture-specific low-level details. We introduce a rigorous dataflow semantics framework for SPADA that defines routing correctness, data races, and deadlocks. Additionally, we design and implement a compiler targeting Cerebras CSL with multi-level lowering. SPADA serves as both a high-level programming interface and an intermediate representation for domain-specific languages (DSLs), which we demonstrate with the GT4Py stencil DSL. SPADA enables developers to express complex parallel patterns -- including pipelined reductions and multi-dimensional stencils -- in 6--8x less code than CSL with near-ideal weak scaling across three orders of magnitude. By unifying programming for spatial dataflow architectures under a single model, SPADA advances both the theoretical foundations and practical usability of these emerging high-performance computing platforms.
