Towards Efficient IMC Accelerator Design Through Joint Hardware-Workload Co-optimization
Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil, Khaled N. Salama
TL;DR
The paper addresses the problem of designing IMC accelerators that efficiently support multiple workloads without manual per-task optimization. It introduces a joint hardware-workload optimization framework that searches a large design space (approximately 1.9e7 configurations) using an evolutionary algorithm to maximize a joint score across workloads, formalized as $f = max(E_w) × max(L_w) × A$ with the constraint $A ≤ A_{constr}$. Key results show substantial improvements over single-workload optimization, achieving 36% (VGG16) and 36% (ResNet18) in energy-latency-area scores, 20% for AlexNet, and 69% for MobileNetV3, while generalized designs incur quantifiable trade-offs, with 17%–86% losses relative to workload-specific designs. The work demonstrates a practical path to generalized IMC hardware that balances energy, latency, and area, aided by software-hardware co-design principles.
Abstract
Designing generalized in-memory computing (IMC) hardware that efficiently supports a variety of workloads requires extensive design space exploration, which is infeasible to perform manually. Optimizing hardware individually for each workload or solely for the largest workload often fails to yield the most efficient generalized solutions. To address this, we propose a joint hardware-workload optimization framework that identifies optimised IMC chip architecture parameters, enabling more efficient, workload-flexible hardware. We show that joint optimization achieves 36%, 36%, 20%, and 69% better energy-latency-area scores for VGG16, ResNet18, AlexNet, and MobileNetV3, respectively, compared to the separate architecture parameters search optimizing for a single largest workload. Additionally, we quantify the performance trade-offs and losses of the resulting generalized IMC hardware compared to workload-specific IMC designs.
