Randomized Forward Mode of Automatic Differentiation For Optimization Algorithms
Khemraj Shukla, Yeonjong Shin
TL;DR
The paper proposes randomized forward mode gradient (RFG) as a memory-efficient alternative to backpropagation by estimating gradients via directional derivatives along random vectors computed with forward-mode AD. A second-moment analysis shows that the smallest expected relative error is achieved by distributions with minimal kurtosis $κ_4$, and, for the quadratic setting, optimal variance is $σ^2 = 1/(d+κ_4-1)$, yielding a biased gradient estimate in general. The authors develop and analyze RFG-based gradient descent and Polyak's heavy ball methods, proving linear convergence on quadratic objectives, with the best rates attained when using the Bernoulli distribution ($κ_4=1$). Extensive computational experiments across quadratic and non-quadratic problems, including SciML tasks, demonstrate that Bernoulli-based RFG often outperforms other distributions and can offer favorable iteration throughput compared with backpropagation, highlighting RFG as a practical gradient-estimation approach for large-scale optimization and scientific machine learning.
Abstract
We present a randomized forward mode gradient (RFG) as an alternative to backpropagation. RFG is a random estimator for the gradient that is constructed based on the directional derivative along a random vector. The forward mode automatic differentiation (AD) provides an efficient computation of RFG. The probability distribution of the random vector determines the statistical properties of RFG. Through the second moment analysis, we found that the distribution with the smallest kurtosis yields the smallest expected relative squared error. By replacing gradient with RFG, a class of RFG-based optimization algorithms is obtained. By focusing on gradient descent (GD) and Polyak's heavy ball (PHB) methods, we present a convergence analysis of RFG-based optimization algorithms for quadratic functions. Computational experiments are presented to demonstrate the performance of the proposed algorithms and verify the theoretical findings.
