CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

Olga Krestinskaya; Mohammed E. Fouda; Ahmed Eltawil; Khaled N. Salama

CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil, Khaled N. Salama

TL;DR

CIMNAS tackles the challenge of co-designing neural networks and Compute-In-Memory hardware by jointly optimizing model architecture, quantization policies, and hardware parameters across device-, circuit-, and architecture-level CIM configurations. It uses an evolutionary algorithm guided by a fast accuracy predictor and a CiMLoop-based hardware estimator to navigate a colossal search space of $9.9\times 10^{85}$ configurations, producing EDAP-efficient designs without sacrificing accuracy. The framework demonstrates dramatic EDAP reductions (e.g., $90.1\times$ to $104.5\times$) and substantial gains in energy and area efficiency on MobileNetV2/ImageNet with RRAM CIM, and shows strong adaptability to SRAM-based ResNet50 on a 7nm node with large EDAP improvements. CIMNAS thus provides a scalable, robust tool for automatic CIM hardware co-design, with potential to generalize to broader workloads and emerging memory technologies.

Abstract

To maximize hardware efficiency and performance accuracy in Compute-In-Memory (CIM)-based neural network accelerators for Artificial Intelligence (AI) applications, co-optimizing both software and hardware design parameters is essential. Manual tuning is impractical due to the vast number of parameters and their complex interdependencies. To effectively automate the design and optimization of CIM-based neural network accelerators, hardware-aware neural architecture search (HW-NAS) techniques can be applied. This work introduces CIMNAS, a joint model-quantization-hardware optimization framework for CIM architectures. CIMNAS simultaneously searches across software parameters, quantization policies, and a broad range of hardware parameters, incorporating device-, circuit-, and architecture-level co-optimizations. CIMNAS experiments were conducted over a search space of 9.9x10^85 potential parameter combinations with the MobileNet model as a baseline and RRAM-based CIM architecture. Evaluated on the ImageNet dataset, CIMNAS achieved a reduction in energy-delay-area product (EDAP) ranging from 90.1x to 104.5x, an improvement in TOPS/W between 4.68x and 4.82x, and an enhancement in TOPS/mm^2 from 11.3x to 12.78x relative to various baselines, all while maintaining an accuracy of 73.81%. The adaptability and robustness of CIMNAS are demonstrated by extending the framework to support the SRAM-based ResNet50 architecture, achieving up to an 819.5x reduction in EDAP. Unlike other state-of-the-art methods, CIMNAS achieves EDAP-focused optimization without any accuracy loss, generating diverse software-hardware parameter combinations for high-performance CIM-based neural network designs. The source code of CIMNAS is available at https://github.com/OlgaKrestinskaya/CIMNAS.

CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

TL;DR

Abstract

CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)