SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs
Cristian Tirelli, Juan Sapriza, Rubén Rodríguez Álvarez, Lorenzo Ferretti, Benoît Denkinger, Giovanni Ansaloni, José Miranda Calero, David Atienza, Laura Pozzi
TL;DR
SAT-MapIt introduces a SAT-based exact modulo scheduling approach for mapping compute-intensive loops onto resource-constrained CGRAs, using a Kernel Mobility Schedule to encode data dependencies and architecture into a Boolean formula. The method demonstrates competitive or superior mapping quality versus state-of-the-art heuristics on a range of benchmarks and validates results with a cycle-accurate OpenEdgeCGRA hardware framework, linking compiler metrics to run-time energy and latency. The work also shows that compiler-level pruning can effectively narrow the hardware design space while retaining Pareto-optimal configurations, highlighting the value of holistic software-to-hardware flows. Limitations include the lack of routing in SAT-MapIt and potential scalability challenges for large CGRAs, pointing to routing integration and non-exact strategies as avenues for future research.
Abstract
Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address mapping challenges. Our work approaches the mapping problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid mapping. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the mapping results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware mapping flows, as the one presented herein.
