Table of Contents
Fetching ...

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil, Khaled N. Salama

TL;DR

This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures that significantly reduces the performance gap between workload-specific and generalized IMC designs.

Abstract

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

TL;DR

This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures that significantly reduces the performance gap between workload-specific and generalized IMC designs.

Abstract

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.
Paper Structure (35 sections, 4 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 35 sections, 4 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Research gap and representative state-of-the-art frameworks (including negi2022naxsun2023gibbonmoitra2023xpertyang2021multiyuan2021nas4rramhan2024comnbenmeziane2023analognaskrestinskaya2020towardsli2021flashguan2022hardwarejiang2020devicekrestinskaya2020automatingzhou2021pimpark2025compasswang2024fastrisso2023precisionbehnam2024harmonicalammie2025lionheartkrestinskaya2025cimnas, which are discussed in detail in Section \ref{['Sback']}).
  • Figure 2: Proposed joint hardware-workload co-optimization framework for in-memory computing hardware.
  • Figure 3: EDAP comparison of optimized designs for RRAM-based and SRAM-based IMC hardware, obtained using separate optimization for the largest workload and joint optimization across multiple workloads.
  • Figure 4: Improved convergence and EDAP scores of optimized designs achieved using the proposed 4-phase genetic algorithm (GA) with enhanced sampling, compared to the traditional, non-modified GA approach (six independent experiments with RRAM-based hardware for each case).
  • Figure 5: Comparison of optimization strategies for generating generalized architectures across RRAM-based (a-d) and SRAM-based (e-h) experiments using different objective functions. These include separate search, maximum-workload-based optimization, joint search with a non-modified genetic algorithm (GA) krestinskaya2024towards, joint search with enhanced sampling, and joint search using the proposed four-phase GA with optimized sampling (top-5 designs are shown in each experiment with top-1 marked with a star). The goal is to achieve performance that is as close as possible to the individually workload-optimized (separate search) designs, indicating minimal loss in hardware efficiency when transitioning to generalized hardware. The results demonstrate that the proposed algorithm is the most effective among all approaches in the majority of the cases, supporting the selection of optimized designs where the transition to generalized architectures results in the least compromise in hardware performance.
  • ...and 5 more figures