Towards training digitally-tied analog blocks via hybrid gradient computation

Timothy Nest; Maxence Ernoult

Towards training digitally-tied analog blocks via hybrid gradient computation

Timothy Nest, Maxence Ernoult

TL;DR

The paper tackles the rising energy cost of AI training by proposing Feedforward-tied Energy-based Models (ff-EBMs), a hybrid digital-analog framework that combines feedforward blocks with energy-based analog blocks. It derives an end-to-end gradient computation method that backpropagates through the feedforward components and eq-propagates through the energy-based components (EP), enabling training of realistic heterogeneous architectures. Empirically, ff-EBMs with Deep Hopfield Network blocks achieve gradient estimates that match end-to-end automatic differentiation, tolerate various block splits without performance loss, and reach a new state-of-the-art in the EP literature on ImageNet32 (approximately 46% top-1). The work offers a principled, scalable roadmap for gradually integrating self-trained analog primitives into existing digital accelerators, potentially reducing training energy while remaining hardware-aware and deployment-friendly.

Abstract

Power efficiency is plateauing in the standard digital electronics realm such that novel hardware, models, and algorithms are needed to reduce the costs of AI training. The combination of energy-based analog circuits and the Equilibrium Propagation (EP) algorithm constitutes one compelling alternative compute paradigm for gradient-based optimization of neural nets. Existing analog hardware accelerators, however, typically incorporate digital circuitry to sustain auxiliary non-weight-stationary operations, mitigate analog device imperfections, and leverage existing digital accelerators.This heterogeneous hardware approach calls for a new theoretical model building block. In this work, we introduce Feedforward-tied Energy-based Models (ff-EBMs), a hybrid model comprising feedforward and energy-based blocks accounting for digital and analog circuits. We derive a novel algorithm to compute gradients end-to-end in ff-EBMs by backpropagating and "eq-propagating" through feedforward and energy-based parts respectively, enabling EP to be applied to much more flexible and realistic architectures. We experimentally demonstrate the effectiveness of the proposed approach on ff-EBMs where Deep Hopfield Networks (DHNs) are used as energy-based blocks. We first show that a standard DHN can be arbitrarily split into any uniform size while maintaining performance. We then train ff-EBMs on ImageNet32 where we establish new SOTA performance in the EP literature (46 top-1 %). Our approach offers a principled, scalable, and incremental roadmap to gradually integrate self-trainable analog computational primitives into existing digital accelerators.

Towards training digitally-tied analog blocks via hybrid gradient computation

TL;DR

Abstract

Paper Structure (54 sections, 6 theorems, 74 equations, 4 figures, 6 tables, 12 algorithms)

This paper contains 54 sections, 6 theorems, 74 equations, 4 figures, 6 tables, 12 algorithms.

Introduction
Background
Notations.
Energy-based models (EBMs)
Standard bilevel optimization
Equilibrium Propagation (EP)
Tying energy-based models with feedforward blocks
Feedforward-tied Energy-based Models (ff-EBMs)
Inference procedure.
Form of the energy functions.
Recovering a feedforward net.
Multi-level optimization of ff-EBMs
A BP--EP gradient chaining algorithm
Main result: explicit BP-EP chaining.
Proposed algorithm: implicit BP-EP chaining.
...and 39 more sections

Key Result

Theorem 3.1

Assuming a model of the form Eq. (def:deeply-nested-model), we denote $s^1_\star, x^1_\star, \cdots, s^{N-1}_\star, \hat{o}_\star$ the states computed during the forward pass as depicted in Alg. alg:inference-ff-ebm. We define the nudged state of block $k$, denoted as $s^k_\beta$, implicitly through Denoting $\delta s^k$ and $\Delta x^k$ the error signals computed at the input of the feedforward b

Figures (4)

Figure 1: Illustrating BP-EP backward gradient chaining through feedforward (red) and energy-based (yellow) blocks, accounting for digital and analog circuits respectively.
Figure 2: Depiction of the forward (left) and backward (right) pathways through a ff-EBM, with yellow and pink blocks denoting EB and feedforward transformations.
Figure 3: EP and ID partially computed gradients ($(\widehat{g}_{w}^{\rm EP}(t))_{t \geq 0}$ in black dotted curves and $(\widehat{g}_{w}^{\rm ID}(t))_{t \geq 0}$ in plain colored curves) going backward through equilibrium for ID and forward through the nudging phase for EP ernoult2019updates for a random sample $x$ and associated label $y$. The ff-EBM comprises 6 blocks and 15 layers in total, with block sizes of either 2 or 3 layers. Each subpanel represents a layer (labelled on the y-axis) with each curve corresponding to a randomly selected weight. "Backward" time is indexed from $t=0$ to $T=120$, starting from block 6 backward to block 1, with 20 fixed-point iteration dynamics (Eq. (\ref{['eq:fixed-point-iteration']})) being used for both EP and ID within each EB block.
Figure 4: Cosine similarity between EP and ID weight gradients on a randomly selected sample $x$ and associated label $y$ in the same setting as Fig. \ref{['fig:gdd']} using the same color code to label the layers. We observe near-perfect alignment between EP and ID gradients.

Theorems & Definitions (12)

Theorem 3.1: Informal
Definition A.1: ff-EBMs
Lemma A.2
proof : Proof of Lemma \ref{['lma:ff']}
Lemma A.3: Lagrangian-based approach
proof : Proof of Lemma \ref{['lma:lagrangian']}
Lemma A.4: Computing Lagrangian multipliers by EP
proof : Proof of Lemma \ref{['lma:lagrangian-ep']}
Theorem A.5: Formal
proof : Proof of Theorem \ref{['theorem:main-result-formal']}
...and 2 more

Towards training digitally-tied analog blocks via hybrid gradient computation

TL;DR

Abstract

Towards training digitally-tied analog blocks via hybrid gradient computation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)