Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
Oscar Li, James Harrison, Jascha Sohl-Dickstein, Virginia Smith, Luke Metz
TL;DR
This work tackles gradient estimation for unrolled computation graphs where automatic differentiation can fail due to non-smooth or black-box dynamics. It introduces Noise-Reuse Evolution Strategies (NRES), a specific GPES variant that reuses a single Gaussian perturbation across an entire episode to minimize gradient estimator variance while remaining online and unbiased. Theoretical variance analysis shows NRES achieves lower or equal variance compared to PES and FullES under realistic assumptions, and empirical results across dynamical systems, meta-learning optimizers, and reinforcement learning demonstrate faster convergence and better wall-clock efficiency. The work highlights online ES as a practical alternative when AD is ineffective, offering substantial speedups and parallelization advantages in challenging UCG settings.
Abstract
Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolution strategies (ES) by interleaving partial unrolls and gradient updates. In this work, we propose a general class of unbiased online evolution strategies methods. We analytically and empirically characterize the variance of this class of gradient estimators and identify the one with the least variance, which we term Noise-Reuse Evolution Strategies (NRES). Experimentally, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning.
