Table of Contents
Fetching ...

SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs

Cristian Tirelli, Juan Sapriza, Rubén Rodríguez Álvarez, Lorenzo Ferretti, Benoît Denkinger, Giovanni Ansaloni, José Miranda Calero, David Atienza, Laura Pozzi

TL;DR

SAT-MapIt introduces a SAT-based exact modulo scheduling approach for mapping compute-intensive loops onto resource-constrained CGRAs, using a Kernel Mobility Schedule to encode data dependencies and architecture into a Boolean formula. The method demonstrates competitive or superior mapping quality versus state-of-the-art heuristics on a range of benchmarks and validates results with a cycle-accurate OpenEdgeCGRA hardware framework, linking compiler metrics to run-time energy and latency. The work also shows that compiler-level pruning can effectively narrow the hardware design space while retaining Pareto-optimal configurations, highlighting the value of holistic software-to-hardware flows. Limitations include the lack of routing in SAT-MapIt and potential scalability challenges for large CGRAs, pointing to routing integration and non-exact strategies as avenues for future research.

Abstract

Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address mapping challenges. Our work approaches the mapping problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid mapping. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the mapping results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware mapping flows, as the one presented herein.

SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs

TL;DR

SAT-MapIt introduces a SAT-based exact modulo scheduling approach for mapping compute-intensive loops onto resource-constrained CGRAs, using a Kernel Mobility Schedule to encode data dependencies and architecture into a Boolean formula. The method demonstrates competitive or superior mapping quality versus state-of-the-art heuristics on a range of benchmarks and validates results with a cycle-accurate OpenEdgeCGRA hardware framework, linking compiler metrics to run-time energy and latency. The work also shows that compiler-level pruning can effectively narrow the hardware design space while retaining Pareto-optimal configurations, highlighting the value of holistic software-to-hardware flows. Limitations include the lack of routing in SAT-MapIt and potential scalability challenges for large CGRAs, pointing to routing integration and non-exact strategies as avenues for future research.

Abstract

Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address mapping challenges. Our work approaches the mapping problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid mapping. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the mapping results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware mapping flows, as the one presented herein.
Paper Structure (27 sections, 22 equations, 11 figures, 8 tables)

This paper contains 27 sections, 22 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Overview of the cgra scheduling and mapping workflow. a): A cil is identified in the source C code, b): Its associated dfg is derived by compiler analysis. c): An abstract view of available hardware resources (i.e., number of processing elements and their connectivity) is constructed. d): The nodes of the dfg are mapped on the processing elements while complying with architectural constraints. e): Nodes are modulo scheduled across different cgra-instructions, partially overlapping the execution of multiple iterations. In the example, a new iteration is started at each cgra-instruction, hence the scheduling has an Initiation Interval (II) equal to 1.
  • Figure 2: a): C code of our running example. The identified is highlighted. b): LLVM ir of the identified . c): Associated . Red edges are loop-carried dependencies, black edges are data dependencies.
  • Figure 3: a): Modulo scheduling of the of the running example, highlighting the division between prologue, kernel, and epilogue. b): Mapped of the in the running example on a $2\times2$
  • Figure 4: SAT-MapIt searches for mappings for a given , iteratively increasing in case the solver returns UNSAT, or register allocation fails to color the model returned by the solver.
  • Figure 5: Top: two examples of dependencies -- $S_1$, $D_1$ and $S_2$,$D_2$ -- with a distance equal to 1. Bottom: an example of dependence with a distance greater than 1.
  • ...and 6 more figures