Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

Killian Bakong; Laurent Massoulié; Edouard Oyallon; Kevin Scaman

Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

Killian Bakong, Laurent Massoulié, Edouard Oyallon, Kevin Scaman

TL;DR

This work addresses the computational and memory bottlenecks of backpropagation by replacing exact vector-Jacobian products with unbiased randomized estimators. It develops a comprehensive sketching framework for backpropagation, proving optimality results for low-rank and diagonal sketches and analyzing how variance propagates through the DAG during the reverse pass. A suite of practical methods—uniform masks, rank-constrained sketches, and data-dependent diagonal sketches—are proposed and evaluated on MLPs, BagNet, and ViT, showing meaningful cost reductions with limited impact on accuracy. The findings offer a scalable path toward more bandwidth- and compute-efficient backpropagation in pipeline-parallel and large-model settings, with future work targeting coordinated variance control and adaptive hyperparameters.

Abstract

In this work we introduce methods to reduce the computational and memory costs of training deep neural networks. Our approach consists in replacing exact vector-jacobian products by randomized, unbiased approximations thereof during backpropagation. We provide a theoretical analysis of the trade-off between the number of epochs needed to achieve a target precision and the cost reduction for each epoch. We then identify specific unbiased estimates of vector-jacobian products for which we establish desirable optimality properties of minimal variance under sparsity constraints. Finally we provide in-depth experiments on multi-layer perceptrons, BagNets and Visual Transfomers architectures. These validate our theoretical results, and confirm the potential of our proposed unbiased randomized backpropagation approach for reducing the cost of deep learning.

Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

TL;DR

Abstract

Paper Structure (52 sections, 10 theorems, 86 equations, 4 figures, 6 algorithms)

This paper contains 52 sections, 10 theorems, 86 equations, 4 figures, 6 algorithms.

Introduction
Contributions.
Sketching Reverse-Mode Automatic Differentiation
Variance in Stochastic Gradient Descent
Stochastic Gradient Surrogate
Gradient Computation on a DAG
Standard Gradient Computations through a Computational Graph.
Gradient Estimation on a DAG
Randomized Vector-Jacobian Products in Linear Settings
Randomized Sketching Framework.
Optimal Unbiased Low-Rank Matrix Approximation.
Application to VJPs.
Diagonal Sketches.
Jacobian Approximations
First Strategy: Applying Uniform Masks
...and 37 more sections

Key Result

Proposition 2.2

Assume the seed at the output node is exact (i.e. $\widehat{g}_{\text{out}} = g_{\text{out}}$) and that as:local_vjp_unbiased holds. Then, for every node $i$ of the DAG,

Figures (4)

Figure 1: Comparison of sampling strategies and scoring methods.
Figure 2: Comparison of types of weighted methods.
Figure 3: Sketching on larger architectures.
Figure 4: Impact of VJP Approximation Location in MLPs.

Theorems & Definitions (16)

Proposition 2.2
Lemma 3.1: Optimal unbiased random sketch under rank constraint
Lemma 3.2: Distortion in Linear Nodes
proof
Proposition 3.3: Minimal Distortion rank $r$ Unbiased Sketch
proof : Sketch of proof
Lemma 3.4: Diagonal mask with expected size at most $r$
Proposition 1.1: Restatement of \ref{['prop:variance_backprop']}
proof
Lemma 1.2: Restatement of \ref{['thm:svd_sketch']}
...and 6 more

Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

TL;DR

Abstract

Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (16)