Table of Contents
Fetching ...

Monomorphism-based CGRA Mapping via Space and Time Decoupling

Cristian Tirelli, Rodrigo Otoni, Laura Pozzi

TL;DR

The paper tackles the scalability challenge of CGRA mapping by decoupling the time and space dimensions: it first uses an SMT-based time scheduling formulation to obtain a feasible temporal solution, then applies a monomorphism-based search to derive a valid spatial mapping. It proves that a time solution under their constraints guarantees a corresponding space solution, enabling a two-phase approach that dramatically reduces compilation time while preserving mapping quality. Empirical results show substantial speedups, especially for large CGRAs (up to ~10^4× faster on 20×20), with only minor or no loss in solution quality. This decoupled, provably sound methodology offers a scalable path for compiling complex loops onto large CGRAs in practice.

Abstract

Coarse-Grain Reconfigurable Arrays (CGRAs) provide flexibility and energy efficiency in accelerating compute-intensive loops. Existing compilation techniques often struggle with scalability, unable to map code onto large CGRAs. To address this, we propose a novel approach to the mapping problem where the time and space dimensions are decoupled and explored separately. We leverage an SMT formulation to traverse the time dimension first, and then perform a monomorphism-based search to find a valid spatial solution. Experimental results show that our approach achieves the same mapping quality of state-of-the-art techniques while significantly reducing compilation time, with this reduction being particularly tangible when compiling for large CGRAs. We achieve approximately $10^5\times$ average compilation speedup for the benchmarks evaluated on a $20\times 20$ CGRA.

Monomorphism-based CGRA Mapping via Space and Time Decoupling

TL;DR

The paper tackles the scalability challenge of CGRA mapping by decoupling the time and space dimensions: it first uses an SMT-based time scheduling formulation to obtain a feasible temporal solution, then applies a monomorphism-based search to derive a valid spatial mapping. It proves that a time solution under their constraints guarantees a corresponding space solution, enabling a two-phase approach that dramatically reduces compilation time while preserving mapping quality. Empirical results show substantial speedups, especially for large CGRAs (up to ~10^4× faster on 20×20), with only minor or no loss in solution quality. This decoupled, provably sound methodology offers a scalable path for compiling complex loops onto large CGRAs in practice.

Abstract

Coarse-Grain Reconfigurable Arrays (CGRAs) provide flexibility and energy efficiency in accelerating compute-intensive loops. Existing compilation techniques often struggle with scalability, unable to map code onto large CGRAs. To address this, we propose a novel approach to the mapping problem where the time and space dimensions are decoupled and explored separately. We leverage an SMT formulation to traverse the time dimension first, and then perform a monomorphism-based search to find a valid spatial solution. Experimental results show that our approach achieves the same mapping quality of state-of-the-art techniques while significantly reducing compilation time, with this reduction being particularly tangible when compiling for large CGRAs. We achieve approximately average compilation speedup for the benchmarks evaluated on a CGRA.

Paper Structure

This paper contains 22 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: $3\times 3$ CGRA with internal view of a PE.
  • Figure 2: Running example. a) A DFG, where black edges are data dependencies and red edges are loop-carried dependencies. b) Mapping of the DFG onto a $2\times 2$ CGRA, on the bottom, with the division between prologue, kernel, and epilogue highlighted. c) Valid time and space solutions on the left, invalid solutions on the right; the erroneous allocations are shown in red.
  • Figure 3: MRRG for a $2 \times 2$ CGRA and $II = 4$. Black edges represent CGRA adjacencies, while green, red, and yellow edges represent time adjacencies from PE0 at $T = 0$. Time adjacencies of the other PEs, as well as the self-loops inherent to every PE, are omitted for clarity.
  • Figure 4: Monomorphism between the DFG shown in Fig. \ref{['fig:example']} and the MRRG shown in Fig. \ref{['fig:mrrg']}; data dependencies are in blue and loop-carried dependencies routed in the MRRG are in red.
  • Figure 5: Compilation time (y-axis), in seconds, in relation to CGRA sizes (x-axis) for the aes benchmark.