Distributionally Robust Losses for Latent Covariate Mixtures
John Duchi, Tatsunori Hashimoto, Hongseok Namkoong
TL;DR
The paper tackles the problem of achieving uniformly good performance across latent subpopulations when data come from a mixture of covariate distributions. It introduces marginal distributionally robust optimization (DRO) over latent subpopulations, leveraging a dual CVaR representation of worst-case risk and a scalable Lp Hölder variational bound to obtain tractable, nonparametric guarantees. The authors provide finite-sample bounds, convergence rates tied to the Wasserstein distance, and hardness results that illuminate dimension-driven limits. Empirically, the marginal DRO approach yields improved worst-case performance on tasks including semantic similarity, wine quality prediction, and recidivism prediction, while highlighting the computational and dimensional trade-offs. Overall, the work offers a rigorous, scalable framework for robust subpopulation performance that bridges covariate shift, fairness, and causal-inference perspectives.
Abstract
While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.
