Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs
Dominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich
TL;DR
The paper compares two processor-array paradigms for accelerating multidimensional nested loops: operation-centric CGRAs and iteration-centric TCPAs. It analyzes architecture, mapping approaches, and toolchains (CGRA-Flow, Morpher, Pillars, CGRA-ME for CGRAs; TURTLE for TCPAs), and performs qualitative and quantitative evaluations across PPA metrics. Findings show TCPAs generally deliver substantial latency reductions and higher data locality through tile-based iteration mapping, albeit at higher hardware complexity and specific data-massage requirements; CGRAs offer simpler, more intuitive programming models but face scalability and mapping-complexity limits. The work highlights the trade-offs between programming ease, hardware cost, and performance, and suggests that future designs may blend iteration- and operation-centric ideas to capitalize on both data locality and flexible mapping.
Abstract
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of loops. Moreover, for mapping a given loop nest application, two opposed mapping methods have emerged: Operation-centric and iteration-centric. Both differ in the granularity of the mapping. The operation-centric approach maps individual operations to the PEs of the array, while the iteration-centric approach maps entire tiles of iterations to each PE. The operation-centric approach is applied predominantly for processor arrays often referred to as Coarse-Grained Reconfigurable Arrays~(CGRAs), while processor arrays supporting an iteration-centric approach are referred to as Tightly-Coupled Processor Arrays~(TCPAs) in the following. This work provides a comprehensive comparison of both approaches and related architectures by evaluating their respective benefits and trade-offs. ...
