Drago: Primal-Dual Coupled Variance Reduction for Faster Distributionally Robust Optimization
Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui
TL;DR
Drago addresses efficiency in penalized distributionally robust optimization by formulating a stochastic primal-dual method that reduces dual variance through a hybrid of randomized and cyclic updates and a novel primal regularization. The algorithm operates on a finite-sum DRO objective with a convex uncertainty set and achieves a linear convergence rate with a complexity depending on the minibatch size, the uncertainty-set size, and smoothness/strong convexity constants. Theoretical results establish a tight overall complexity bound while practical experiments on regression and text classification validate fast convergence and favorable wall-clock performance across varying data sizes and conditioning. This work advances scalable DRO by delivering provable linear convergence with general applicability to common uncertainty sets such as f-divergence balls and spectral risk measures, and demonstrates practical impact on large-scale learning under distribution shift.
Abstract
We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses learning using $f$-DRO and spectral/$L$-risk minimization. We present Drago, a stochastic primal-dual algorithm that combines cyclic and randomized components with a carefully regularized primal update to achieve dual variance reduction. Owing to its design, Drago enjoys a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems with a fine-grained dependency on primal and dual condition numbers. Theoretical results are supported by numerical benchmarks on regression and classification tasks.
