Table of Contents
Fetching ...

Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array Compilers

Sabrina Yarzada, Christopher Torng

TL;DR

Capstone is introduced, a power-aware extension of Cascade that integrates a fast, compiler-resident power model with a user-tunable controller that guides the bitstream selection process towards optimization targets, and indicates that cap-aware compilation is both necessary and practical.

Abstract

Coarse-grained reconfigurable arrays (CGRAs) have attracted growing interest because they exhibit performance and energy efficiency competitive with ASICs while maintaining flexibility similar to FPGAs. These properties make CGRAs attractive in accelerator and other power-constrained system contexts. However, modern CGRA compilers aggressively pipeline for frequency and performance improvements, often violating hard power budgets. We empirically show that, in state-of-the-art CGRA compilers such as Cascade, post-place-and-route (post-PnR) pipelining increases power monotonically and ultimately exceeds fixed power caps across diverse workloads. In response, we introduce \emph{Capstone}, a power-aware extension of Cascade that integrates a fast, compiler-resident power model with a user-tunable controller that guides the bitstream selection process towards optimization targets. Capstone predicts per-iteration power directly inside the post-PnR compilation loop and selects one or a small set of PnR configurations such that at least one meets a user-specified power cap. Thus, we shift the objective from indiscriminately maximizing frequency to maximizing safe frequency under a discrete power cap. On a suite of kernels spanning fundamental dense and sparse applications, Capstone meets a power cap and minimizes remaining power headroom while preserving feasible performance. Our results indicate that cap-aware compilation is both necessary and practical, as the compiler can proactively land on cap-compliant points and expose predictable performance under power constraints.

Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array Compilers

TL;DR

Capstone is introduced, a power-aware extension of Cascade that integrates a fast, compiler-resident power model with a user-tunable controller that guides the bitstream selection process towards optimization targets, and indicates that cap-aware compilation is both necessary and practical.

Abstract

Coarse-grained reconfigurable arrays (CGRAs) have attracted growing interest because they exhibit performance and energy efficiency competitive with ASICs while maintaining flexibility similar to FPGAs. These properties make CGRAs attractive in accelerator and other power-constrained system contexts. However, modern CGRA compilers aggressively pipeline for frequency and performance improvements, often violating hard power budgets. We empirically show that, in state-of-the-art CGRA compilers such as Cascade, post-place-and-route (post-PnR) pipelining increases power monotonically and ultimately exceeds fixed power caps across diverse workloads. In response, we introduce \emph{Capstone}, a power-aware extension of Cascade that integrates a fast, compiler-resident power model with a user-tunable controller that guides the bitstream selection process towards optimization targets. Capstone predicts per-iteration power directly inside the post-PnR compilation loop and selects one or a small set of PnR configurations such that at least one meets a user-specified power cap. Thus, we shift the objective from indiscriminately maximizing frequency to maximizing safe frequency under a discrete power cap. On a suite of kernels spanning fundamental dense and sparse applications, Capstone meets a power cap and minimizes remaining power headroom while preserving feasible performance. Our results indicate that cap-aware compilation is both necessary and practical, as the compiler can proactively land on cap-compliant points and expose predictable performance under power constraints.
Paper Structure (10 sections, 12 equations, 17 figures, 3 tables, 2 algorithms)

This paper contains 10 sections, 12 equations, 17 figures, 3 tables, 2 algorithms.

Figures (17)

  • Figure 1: As state-of-the-art CGRA compilers pipeline for higher clock frequencies, they can unknowingly exceed a power cap (horizontal line). For the vector elementwise addition kernel, projections of future incremental performance optimizations (e.g., 10%) for the Cascade compiler all similarly exceed the power cap.
  • Figure 2: CGRA architecture including PE tiles and interconnect components (SBs, CBs). The pink route highlights a representative routed critical path that post-PnR pipelining breaks by inserting green registers along the interconnect. Input and output tracks along the north and west perimeters are connected to memory banks.
  • Figure 3: Normalized power vs post-PnR pipelining iteration for the vector elementwise addition kernel used by Cascade melchert-cascade-2024.
  • Figure 4: PTPX vs Capstone model power estimates comparison. Red calibrates on only vec_elemadd; blue calibrates on all kernels.
  • Figure 5: Hierarchical energy model learning. We learn a nonnegative mapping $W$ and per-event coefficients $\alpha$ that align compiler events to gate-level rows, discover energy events from per-row power, then perform compile-time prediction via a sum-of-products over compiler-visible features.
  • ...and 12 more figures