SparseMap: Loop Mapping for Sparse CNNs on Streaming Coarse-grained Reconfigurable Array
Xiaobing Ni, Mengke Ge, Jiaheng Ruan, Song Chen, Yi Kang
TL;DR
This work addresses the throughput degradation of streaming CGRAs when accelerating sparse CNNs due to irregular input data causing excessive COPs and MCIDs. It introduces SparseMap, a mapping algorithm that combines efficient I/O data management with scheduling and binding, and employs three key techniques: association oriented input bus allocation, crossbar based multi casting, and reconstruction of internal adder dependencies. Through MIS-based binding on a conflict graph and pre-allocation of routing, SparseMap achieves substantial reductions in COPs and MCIDs while maintaining or improving the initiation interval $II$, with reported COP reductions of up to 92.5% and MCID reductions of 46%, and speedups of 1.5–2.67× over baselines. The approach demonstrates practical impact by enabling higher throughput Sparse CNN acceleration on streaming CGRAs, addressing irregular data patterns inherent to sparse networks.
Abstract
Streaming coarse-grained reconfgurable array (CGRA) is a promising architecture for data/computing-intensive applications because of its fexibility, high throughput and efcient memory system. However,when accelerating sparse CNNs, the irregular input data demands inside sparse CNNs would cause excessive caching operations (COPs) and multi-cycle internal dependencies (MCIDs) between operations, declining the throughput of the streaming CGRA. We propose a mapping method for sparse CNNs onto streaming CGRA, SparseMap, which incorporates an efcient I/O data management along with operation scheduling and binding, to reduce the COPs and MCIDs, thereby ensuring the optimal throughput of streaming CGRA.The experimental results show SparseMap reduces 92.5% COPs and 46.0 % MCIDs while achieves the same or even smaller initiation interval (II) compared to previous works.
