CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

Yingjie Qi; Jianlei Yang; Rubing Yang; Cenlin Duan; Xiaolin He; Ziyan He; Weitao Pan; Weisheng Zhao

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

Yingjie Qi, Jianlei Yang, Rubing Yang, Cenlin Duan, Xiaolin He, Ziyan He, Weitao Pan, Weisheng Zhao

TL;DR

CIMinus tackles the lack of a unified modeling approach for sparse DNN workloads on SRAM-based CIM architectures. It introduces FlexBlock sparsity and a pruning workflow to generate hardware-friendly sparse weights, coupled with a cost-modeling framework that estimates latency and energy across a modular CIM design. The framework is validated against recent sparse CIM designs, achieving close accuracy (within $5.27\%$) and enabling rapid exploration of sparsity patterns and mapping strategies across multi-macro CIMs. By bridging model pruning, hardware description, and mapping in a single interface, CIMinus supports practical co-design decisions and accelerates the design of efficient CIM systems for sparse DNN workloads.

Abstract

Compute-in-memory (CIM) has emerged as a pivotal direction for accelerating workloads in the field of machine learning, such as Deep Neural Networks (DNNs). However, the effective exploitation of sparsity in CIM systems presents numerous challenges, due to the inherent limitations in their rigid array structures. Designing sparse DNN dataflows and developing efficient mapping strategies also become more complex when accounting for diverse sparsity patterns and the flexibility of a multi-macro CIM structure. Despite these complexities, there is still an absence of a unified systematic view and modeling approach for diverse sparse DNN workloads in CIM systems. In this paper, we propose CIMinus, a framework dedicated to cost modeling for sparse DNN workloads on CIM architectures. It provides an in-depth energy consumption analysis at the level of individual components and an assessment of the overall workload latency. We validate CIMinus against contemporary CIM architectures and demonstrate its applicability in two use-cases. These cases provide valuable insights into both the impact of sparsity patterns and the effectiveness of mapping strategies, bridging the gap between theoretical design and practical implementation.

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

TL;DR

Abstract

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)