Table of Contents
Fetching ...

SPADA: A Spatial Dataflow Architecture Programming Language

Lukas Gianinazzi, Tal Ben-Nun, Torsten Hoefler

TL;DR

SPADA tackles the challenge of programming spatial dataflow architectures by providing a language with explicit data placement, dataflow patterns, and asynchronous execution, underpinned by a formal dataflow semantics and an automatic routing assignment mechanism. It delivers an end-to-end GT4Py-to-Cerebras CSL compilation pipeline, including a Stencil IR and multi-stage lowering, achieving substantial code-size reduction and strong performance on WSE-2. The approach is validated with hand-written SPADA kernels and GT4Py-stencil workloads, showing over 150 TFlop/s and near-ideal weak scaling across three orders of magnitude. By unifying programming under a single, principled model and offering automated routing, SPADA enables productive development for spatial architectures with competitive HPC performance.

Abstract

Spatial dataflow architectures like the Cerebras Wafer-Scale Engine achieve exceptional performance in AI and scientific applications by leveraging distributed memory across processing elements (PEs) and localized computation. However, programming these architectures remains challenging due to the need for explicit orchestration of data movement through reconfigurable networks-on-chip and asynchronous computation triggered by data arrival. Existing FPGA and CGRA programming models emphasize loop scheduling but overlook the unique capabilities of spatial dataflow architectures, particularly efficient dataflow over regular grids and intricate routing management. We present SPADA, a programming language that provides precise control over data placement, dataflow patterns, and asynchronous operations while abstracting architecture-specific low-level details. We introduce a rigorous dataflow semantics framework for SPADA that defines routing correctness, data races, and deadlocks. Additionally, we design and implement a compiler targeting Cerebras CSL with multi-level lowering. SPADA serves as both a high-level programming interface and an intermediate representation for domain-specific languages (DSLs), which we demonstrate with the GT4Py stencil DSL. SPADA enables developers to express complex parallel patterns -- including pipelined reductions and multi-dimensional stencils -- in 6--8x less code than CSL with near-ideal weak scaling across three orders of magnitude. By unifying programming for spatial dataflow architectures under a single model, SPADA advances both the theoretical foundations and practical usability of these emerging high-performance computing platforms.

SPADA: A Spatial Dataflow Architecture Programming Language

TL;DR

SPADA tackles the challenge of programming spatial dataflow architectures by providing a language with explicit data placement, dataflow patterns, and asynchronous execution, underpinned by a formal dataflow semantics and an automatic routing assignment mechanism. It delivers an end-to-end GT4Py-to-Cerebras CSL compilation pipeline, including a Stencil IR and multi-stage lowering, achieving substantial code-size reduction and strong performance on WSE-2. The approach is validated with hand-written SPADA kernels and GT4Py-stencil workloads, showing over 150 TFlop/s and near-ideal weak scaling across three orders of magnitude. By unifying programming under a single, principled model and offering automated routing, SPADA enables productive development for spatial architectures with competitive HPC performance.

Abstract

Spatial dataflow architectures like the Cerebras Wafer-Scale Engine achieve exceptional performance in AI and scientific applications by leveraging distributed memory across processing elements (PEs) and localized computation. However, programming these architectures remains challenging due to the need for explicit orchestration of data movement through reconfigurable networks-on-chip and asynchronous computation triggered by data arrival. Existing FPGA and CGRA programming models emphasize loop scheduling but overlook the unique capabilities of spatial dataflow architectures, particularly efficient dataflow over regular grids and intricate routing management. We present SPADA, a programming language that provides precise control over data placement, dataflow patterns, and asynchronous operations while abstracting architecture-specific low-level details. We introduce a rigorous dataflow semantics framework for SPADA that defines routing correctness, data races, and deadlocks. Additionally, we design and implement a compiler targeting Cerebras CSL with multi-level lowering. SPADA serves as both a high-level programming interface and an intermediate representation for domain-specific languages (DSLs), which we demonstrate with the GT4Py stencil DSL. SPADA enables developers to express complex parallel patterns -- including pipelined reductions and multi-dimensional stencils -- in 6--8x less code than CSL with near-ideal weak scaling across three orders of magnitude. By unifying programming for spatial dataflow architectures under a single model, SPADA advances both the theoretical foundations and practical usability of these emerging high-performance computing platforms.

Paper Structure

This paper contains 44 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Checkerboard Decomposition Pass (One Active Dimension)
  • Figure 2: SPADA-to-CSL Task Assignment Pipeline
  • Figure 3: Speedup of Pipelined over Vectorized 1D Reduction
  • Figure 4: Single-Precision Copy Throughput per PE (in GiB/s)
  • Figure 5: Stencil Total Flop/s Performance Scaling (K=80)
  • ...and 2 more figures