Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Olga Krestinskaya; Mohammed E. Fouda; Ahmed Eltawil; Khaled N. Salama

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil, Khaled N. Salama

TL;DR

This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures that significantly reduces the performance gap between workload-specific and generalized IMC designs.

Abstract

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

TL;DR

Abstract

Paper Structure (35 sections, 4 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 35 sections, 4 equations, 10 figures, 7 tables, 1 algorithm.

Introduction
Background
Software-hardware co-design for AI applications
Automated co-design frameworks for IMC
Multiple workloads support
Motivation and state-of-the-art comparison
Proposed joint hardware-workload co-optimization framework
Framework overview
Search space and hardware configuration setups
Optimization algorithm
Why evolutionary genetic search?
Proposed optimization algorithm
Results
Comparison of joint hardware-workload co-optimization with optimization for the largest workload
Performance of the proposed algorithm: convergence improvements and the impact of optimized sampling
...and 20 more sections

Figures (10)

Figure 1: Research gap and representative state-of-the-art frameworks (including negi2022naxsun2023gibbonmoitra2023xpertyang2021multiyuan2021nas4rramhan2024comnbenmeziane2023analognaskrestinskaya2020towardsli2021flashguan2022hardwarejiang2020devicekrestinskaya2020automatingzhou2021pimpark2025compasswang2024fastrisso2023precisionbehnam2024harmonicalammie2025lionheartkrestinskaya2025cimnas, which are discussed in detail in Section \ref{['Sback']}).
Figure 2: Proposed joint hardware-workload co-optimization framework for in-memory computing hardware.
Figure 3: EDAP comparison of optimized designs for RRAM-based and SRAM-based IMC hardware, obtained using separate optimization for the largest workload and joint optimization across multiple workloads.
Figure 4: Improved convergence and EDAP scores of optimized designs achieved using the proposed 4-phase genetic algorithm (GA) with enhanced sampling, compared to the traditional, non-modified GA approach (six independent experiments with RRAM-based hardware for each case).
Figure 5: Comparison of optimization strategies for generating generalized architectures across RRAM-based (a-d) and SRAM-based (e-h) experiments using different objective functions. These include separate search, maximum-workload-based optimization, joint search with a non-modified genetic algorithm (GA) krestinskaya2024towards, joint search with enhanced sampling, and joint search using the proposed four-phase GA with optimized sampling (top-5 designs are shown in each experiment with top-1 marked with a star). The goal is to achieve performance that is as close as possible to the individually workload-optimized (separate search) designs, indicating minimal loss in hardware efficiency when transitioning to generalized hardware. The results demonstrate that the proposed algorithm is the most effective among all approaches in the majority of the cases, supporting the selection of optimized designs where the transition to generalized architectures results in the least compromise in hardware performance.
...and 5 more figures

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

TL;DR

Abstract

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (10)