Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation
Killian Bakong, Laurent Massoulié, Edouard Oyallon, Kevin Scaman
TL;DR
This work addresses the computational and memory bottlenecks of backpropagation by replacing exact vector-Jacobian products with unbiased randomized estimators. It develops a comprehensive sketching framework for backpropagation, proving optimality results for low-rank and diagonal sketches and analyzing how variance propagates through the DAG during the reverse pass. A suite of practical methods—uniform masks, rank-constrained sketches, and data-dependent diagonal sketches—are proposed and evaluated on MLPs, BagNet, and ViT, showing meaningful cost reductions with limited impact on accuracy. The findings offer a scalable path toward more bandwidth- and compute-efficient backpropagation in pipeline-parallel and large-model settings, with future work targeting coordinated variance control and adaptive hyperparameters.
Abstract
In this work we introduce methods to reduce the computational and memory costs of training deep neural networks. Our approach consists in replacing exact vector-jacobian products by randomized, unbiased approximations thereof during backpropagation. We provide a theoretical analysis of the trade-off between the number of epochs needed to achieve a target precision and the cost reduction for each epoch. We then identify specific unbiased estimates of vector-jacobian products for which we establish desirable optimality properties of minimal variance under sparsity constraints. Finally we provide in-depth experiments on multi-layer perceptrons, BagNets and Visual Transfomers architectures. These validate our theoretical results, and confirm the potential of our proposed unbiased randomized backpropagation approach for reducing the cost of deep learning.
