In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning
Jindan Li, Zhaoxian Wu, Gaowen Liu, Tayfun Gokmen, Tianyi Chen
TL;DR
This paper addresses the challenge of training deep networks with analog in-memory computing (AIMC) hardware that offers limited conductance states per device. It introduces a multi-tile residual learning framework where high-precision weights are formed as a geometric sum $\overline{W}=\sum_{n=0}^{N} \gamma^n W^{(n)}$, and where each tile learns on its own timescale to iteratively approximate the residual left by preceding tiles, yielding exponential suppression of the training residual as the number of tiles grows. The authors provide a stochastic-approximation analysis showing linear convergence of the residual-learning Lyapunov function to an asymptotic error bound that scales with quantization noise and gradient variance, and they demonstrate experimentally that the approach surpasses TT-v1/TT-v2 baselines and rivals mixed-precision on CIFAR/MNIST-style tasks under low-state devices. The work also details an analog-circuit implementation with forward, backward, and transfer operations, analyzes hardware costs, and discusses the practical implications for scalable, energy-efficient on-chip training on ReRAM-like devices. Overall, the proposed multi-timescale residual learning enables more accurate and efficient AIMC training in the presence of limited conductance states, reducing digital storage and latency while maintaining convergence guarantees.
Abstract
Analog in-memory computing (AIMC) accelerators enable efficient deep neural network computation directly within memory using resistive crossbar arrays, where model parameters are represented by the conductance states of memristive devices. However, effective in-memory training typically requires at least 8-bit conductance states to match digital baselines. Realizing such fine-grained states is costly and often requires complex noise mitigation techniques that increase circuit complexity and energy consumption. In practice, many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints, and this limited update precision substantially degrades training accuracy. To enable on-chip training with these limited-state devices, this paper proposes a \emph{residual learning} framework that sequentially learns on multiple crossbar tiles to compensate the residual errors from low-precision weight updates. Our theoretical analysis shows that the optimality gap shrinks with the number of tiles and achieves a linear convergence rate. Experiments on standard image classification benchmarks demonstrate that our method consistently outperforms state-of-the-art in-memory analog training strategies under limited-state settings, while incurring only moderate hardware overhead as confirmed by our cost analysis.
