HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable Arrays
Alan Jia Bao Du, Tarek S. Abdelrahman
TL;DR
HeLEx addresses the challenge of efficiently mapping diverse DFGs onto heterogeneous, elastic CGRAs by starting from a full, homogeneous layout and using a branch-and-bound search to prune compute resources per PE. It introduces operation grouping, a cost-based BB framework, and a heatmap-derived initial layout to produce heterogeneous CGRAs that dramatically reduce area and power while preserving mapping feasibility across all input DFGs; results show average area reductions around 70% and power savings over 50%, with operation reductions near 69% and near-minimal theoretical limits. The approach outperforms two state-of-the-art frameworks in reducing excess compute resources and maintains low latency impact, suggesting strong practical potential for domain-specific CGRA design. These gains are complemented by a validated cost model against DC synthesis and insights into memory and interconnect resource effects, positioning HeLEx as a valuable precursor to broader CGRA design-space exploration and optimization tools.
Abstract
We present HeLEx, a framework for determining the functional layout of heterogeneous spatially-configured elastic Coarse-Grained Reconfigurable Arrays (CGRAs). Given a collection of input data flow graphs (DFGs) and a target CGRA, the framework starts with a full layout in which every processing element (PE) supports every operation in the DFGs. It then employs a branch-and-bound (BB) search to eliminate operations out of PEs, ensuring that the input DFGs successfully map onto the resulting CGRAs, eventually returning an optimized heterogeneous CGRA. Experimental evaluation with 12 DFGs and 9 target CGRA sizes reveals that the framework reduces the number of operations by 68.7% on average, resulting in a reduction of CGRA area by almost 70% and of power by over 51%, all compared to the initial full layout. HeLEx generates CGRAs that are on average only within 6.2% of theoretically minimum CGRAs that support exactly the number of operations needed by the input DFGs. A comparison with functional layouts produced by two state-of-the-art frameworks indicates that HeLEx achieves better reduction in the number of operations, by up to 2.6X.
