Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles
Yifan Hu, Jie Wang, Xin Chen, Niao He
TL;DR
This work studies stochastic optimization under biased oracles by introducing a unified multi-level Monte Carlo (MLMC) gradient framework. By telescoping gradients across levels, the authors construct several MLMC gradient estimators (V-MLMC, RT-MLMC, RU-MLMC, RR-MLMC) and couple them with SGD and variance-reduction techniques to achieve favorable bias-variance-cost tradeoffs. They provide nonasymptotic total-cost analyses across strongly convex, convex, and nonconvex settings, showing that, under suitable conditions, biased-oracle problems can match the complexity of classical unbiased stochastic optimization, and they give sharper improvements for conditional stochastic optimization, shortfall risk, and related tasks. The theory is complemented by extensive experiments in distributionally robust optimization, pricing/staffing, and contrastive learning, demonstrating substantial sample-efficient gains of MLMC gradient methods in practice.
Abstract
We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce the complexity in the nonconvex regime. Our results imply that a series of stochastic optimization problems with biased oracles, previously considered to be more challenging, is fundamentally no harder than the classical stochastic optimization with unbiased oracles. We also delineate the boundary conditions under which these problems become more difficult. Moreover, MLMC gradient methods significantly improve the best-known complexities in the literature for conditional stochastic optimization and shortfall risk optimization. Our extensive numerical experiments on distributionally robust optimization, pricing and staffing scheduling problems, and contrastive learning demonstrate the superior performance of MLMC gradient methods.
