Towards training digitally-tied analog blocks via hybrid gradient computation
Timothy Nest, Maxence Ernoult
TL;DR
The paper tackles the rising energy cost of AI training by proposing Feedforward-tied Energy-based Models (ff-EBMs), a hybrid digital-analog framework that combines feedforward blocks with energy-based analog blocks. It derives an end-to-end gradient computation method that backpropagates through the feedforward components and eq-propagates through the energy-based components (EP), enabling training of realistic heterogeneous architectures. Empirically, ff-EBMs with Deep Hopfield Network blocks achieve gradient estimates that match end-to-end automatic differentiation, tolerate various block splits without performance loss, and reach a new state-of-the-art in the EP literature on ImageNet32 (approximately 46% top-1). The work offers a principled, scalable roadmap for gradually integrating self-trained analog primitives into existing digital accelerators, potentially reducing training energy while remaining hardware-aware and deployment-friendly.
Abstract
Power efficiency is plateauing in the standard digital electronics realm such that novel hardware, models, and algorithms are needed to reduce the costs of AI training. The combination of energy-based analog circuits and the Equilibrium Propagation (EP) algorithm constitutes one compelling alternative compute paradigm for gradient-based optimization of neural nets. Existing analog hardware accelerators, however, typically incorporate digital circuitry to sustain auxiliary non-weight-stationary operations, mitigate analog device imperfections, and leverage existing digital accelerators.This heterogeneous hardware approach calls for a new theoretical model building block. In this work, we introduce Feedforward-tied Energy-based Models (ff-EBMs), a hybrid model comprising feedforward and energy-based blocks accounting for digital and analog circuits. We derive a novel algorithm to compute gradients end-to-end in ff-EBMs by backpropagating and "eq-propagating" through feedforward and energy-based parts respectively, enabling EP to be applied to much more flexible and realistic architectures. We experimentally demonstrate the effectiveness of the proposed approach on ff-EBMs where Deep Hopfield Networks (DHNs) are used as energy-based blocks. We first show that a standard DHN can be arbitrarily split into any uniform size while maintaining performance. We then train ff-EBMs on ImageNet32 where we establish new SOTA performance in the EP literature (46 top-1 %). Our approach offers a principled, scalable, and incremental roadmap to gradually integrate self-trainable analog computational primitives into existing digital accelerators.
