LionHeart: A Layer-based Mapping Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles
Corey Lammie, Yuxuan Wang, Flavio Ponzina, Joshua Klein, Hadjer Benmeziane, Marina Zapater, Irem Boybat, Abu Sebastian, Giovanni Ansaloni, David Atienza
TL;DR
LionHeart addresses the challenge of deploying DL inference on analog in-memory computing (AIMC) tiles by proposing a layer-wise, accuracy-driven mapping framework that greedily assigns large-MAC layers to analog compute while preserving accuracy under a user-defined threshold. It couples MAC-based layer ranking, hardware-aware training with AIHWKIT, and full-system validation with ALPINE to produce practical, hardware-agnostic mappings that maximize analog utilization. Across CNNs on CIFAR-10/100 and a transformer (MobileBERT) on SQuAD, LionHeart delivers substantial speedups and energy efficiency (up to $6\times$) while maintaining target accuracy, and it remains open-source for broader adoption. The work demonstrates a viable path to harness AIMC for edge DL workloads, with robust handling of temporal drift and temporal evaluation considerations.
Abstract
When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6X, with a user-defined accuracy threshold for a fully digital floating point implementation. LionHeart is open-sourced here: https://github.com/IBM/lionheart.
