Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

Manuel Le Gallo; Corey Lammie; Julian Buechel; Fabio Carta; Omobayode Fagbohungbe; Charles Mackin; Hsinyu Tsai; Vijay Narayanan; Abu Sebastian; Kaoutar El Maghraoui; Malte J. Rasch

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

Manuel Le Gallo, Corey Lammie, Julian Buechel, Fabio Carta, Omobayode Fagbohungbe, Charles Mackin, Hsinyu Tsai, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui, Malte J. Rasch

TL;DR

The paper surveys analog in-memory computing (AIMC) as a route to energy-efficient deep neural networks and introduces IBM AIHWKIT, a PyTorch-based simulator that models AIMC nonidealities during training and inference. It details the software architecture, including per-tile RPUConfig hardware modeling, multiple nonidealities, and hardware-aware training methods (e.g., in-memory SGD, TT, cTTv2, and mixed-precision), and demonstrates how to calibrate device measurements to the simulator. The Analog AI Cloud Composer (AAICC) is presented as a cloud-based, no-code platform that provides templates, hardware access, and remote execution for AIMC experiments, linking software research with real hardware. The work emphasizes extensibility, offering workflow examples, device-parameter variation studies, and step-by-step instructions for customizing noise models and MVM implementations, thereby enabling end-to-end exploration from device materials to DNN performance with accessible notebooks and templates.

Abstract

Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, a platform that provides the benefits of using the AIHWKit simulation in a fully managed cloud setting along with physical AIMC hardware access, freely available at https://aihw-composer.draco.res.ibm.com. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial.

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

TL;DR

Abstract

Paper Structure (64 sections, 5 equations, 21 figures, 11 tables)

This paper contains 64 sections, 5 equations, 21 figures, 11 tables.

Introduction
AIMC Concepts
Detailed Introduction to AIMC
How to Perform DNN Training and Inference with AIMC
AIHWKIT design
Simulator Code-design Overview
Model Conversion and Analog Optimizers
Tile-level RPUConfig Specifies All Analog Hardware settings
Configurable MVM Nonidealities
AIMC Network Weight Encoding
Analog Tile Size and Bias
Initial weight mapping
Output Noise
Short-term Weight Noise
Input and Output Quantization
...and 49 more sections

Figures (21)

Figure 1: (a), Illustration of a potential AIMC chip. (b) AIMC devices implemented in AIHWKIT and their properties.
Figure 2: (a) Mapping of a neural network to an AIMC chip. (b) Implementation of in-memory SGD weight update. (c) Implementation of TTv2 weight update. (d) Implementation of mixed-precision weight update.
Figure 3: Design of the AIHWKIT. A DNN is defined with typical pyTorch commands, except for layers that are to be performed in AIMC. We provide analog layers to implement convolution layers, linear layers etc. (see Tab. \ref{['tab:analog-layers']}). Each of these analog layer modules contain (at least) one analog tile module that encapsulates the analog computations as well as concatenating of logical tile arrays. Each analog tile module consists of one or multiple analog tiles. These analog tiles encapsulate the NVM crossbar operations together with immediate peripheral compute (such as ADC and DAC, affine output scaling and bias). Each analog tile can be configured in a broad way using a RPUConfig. The RPUConfig determines in a highly customizable way how the nonideal AIMC forward, backward, and update behaviour is actually implemented and what peripheral aspects and device materials are used in the AIMC hardware of investigation.
Figure 4: Non-ideal MVM from a $512\times512$ analog tile simulated using the AIHWKIT with commonly used settings, as listed in Table \ref{['tab:mvm-nonidealities']}, when programming noise is not applied. Inputs are sampled from a sparse uniform distribution, with a sparsity of 50%, and weights are sampled from a clipped Gaussian distribution with a standard deviation of 0.246. Output values are normalized using out_bound, so clipping happens at different normalized output values.
Figure 5: (a) Experimentally (hardware) obtained temporal evolution of PCM conductance Joshi2020 compared to that simulated by the AIHWKIT PCMLikeNoise statistical noise model. Note that it is assumed all weights are programmed at the same time in the simulation, whereas in the experiment, devices converged at different iterations of programming. (b) Non-ideal MVM from a $512\times512$ analog tile simulated using the AIHWKIT with commonly used settings, as listed in Tab. \ref{['tab:mvm-nonidealities']}, and the PCMLikeNoise statistical noise model. Inputs are sampled from a sparse uniform distribution, with a sparsity of 50%, and weights are sampled from a clipped Gaussian distribution with a standard deviation of 0.246. For $t=1$s, the reported $L_2$ error of the MVM is 13%.
...and 16 more figures

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

TL;DR

Abstract

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (21)