Table of Contents
Fetching ...

LionHeart: A Layer-based Mapping Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles

Corey Lammie, Yuxuan Wang, Flavio Ponzina, Joshua Klein, Hadjer Benmeziane, Marina Zapater, Irem Boybat, Abu Sebastian, Giovanni Ansaloni, David Atienza

TL;DR

LionHeart addresses the challenge of deploying DL inference on analog in-memory computing (AIMC) tiles by proposing a layer-wise, accuracy-driven mapping framework that greedily assigns large-MAC layers to analog compute while preserving accuracy under a user-defined threshold. It couples MAC-based layer ranking, hardware-aware training with AIHWKIT, and full-system validation with ALPINE to produce practical, hardware-agnostic mappings that maximize analog utilization. Across CNNs on CIFAR-10/100 and a transformer (MobileBERT) on SQuAD, LionHeart delivers substantial speedups and energy efficiency (up to $6\times$) while maintaining target accuracy, and it remains open-source for broader adoption. The work demonstrates a viable path to harness AIMC for edge DL workloads, with robust handling of temporal drift and temporal evaluation considerations.

Abstract

When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6X, with a user-defined accuracy threshold for a fully digital floating point implementation. LionHeart is open-sourced here: https://github.com/IBM/lionheart.

LionHeart: A Layer-based Mapping Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles

TL;DR

LionHeart addresses the challenge of deploying DL inference on analog in-memory computing (AIMC) tiles by proposing a layer-wise, accuracy-driven mapping framework that greedily assigns large-MAC layers to analog compute while preserving accuracy under a user-defined threshold. It couples MAC-based layer ranking, hardware-aware training with AIHWKIT, and full-system validation with ALPINE to produce practical, hardware-agnostic mappings that maximize analog utilization. Across CNNs on CIFAR-10/100 and a transformer (MobileBERT) on SQuAD, LionHeart delivers substantial speedups and energy efficiency (up to ) while maintaining target accuracy, and it remains open-source for broader adoption. The work demonstrates a viable path to harness AIMC for edge DL workloads, with robust handling of temporal drift and temporal evaluation considerations.

Abstract

When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6X, with a user-defined accuracy threshold for a fully digital floating point implementation. LionHeart is open-sourced here: https://github.com/IBM/lionheart.
Paper Structure (22 sections, 1 equation, 10 figures, 5 tables)

This paper contains 22 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: LionHeart heterogeneously maps layers of ML networks to digital or analog resources, maximizes performance by exploiting AIMC acceleration, while at the same time abiding to accuracy constraints.
  • Figure 2: (a) Depiction of an AIMC tile and its underlying compute mechanism. Each memristor, as depicted, is representative of a unit cell.
  • Figure 3: Mapping of (a)FC and (b)CONV layers to device conductances. Weights are linearly scaled and mapped between $G_{min}$ and $G_{max}$. (c) Mapping/execution flow of a self-attention block when the maximum analog MAC ratio is achieved.
  • Figure 4: (a) High-level overview of our proposed framework. (b) The validation accuracy at the desired evaluation time ($t_{eval}=$1d) and training loss of a candidate AIMC layer during analog HWA retraining. Training is considered converged when, for a user-defined convergence window, the training loss does not decrease. (c) High-level overview of analog HWA retraining.
  • Figure 5: For (a,b) ResNet8, (c,d) ResNet20, (e,f) AlexNet, (g,h) VGG16, (i,j) MobileNetV2, and (k,l) MobileBERT, the MAC ratio (as a percentage) and corresponding test set accuracies (F1 scores for SQuAD) at $t=$1d. The MAC ratio quantity represents the ratio between the number of MAC performed using AIMC and the total number of MAC
  • ...and 5 more figures