Distributionally Robust Losses for Latent Covariate Mixtures

John Duchi; Tatsunori Hashimoto; Hongseok Namkoong

Distributionally Robust Losses for Latent Covariate Mixtures

John Duchi, Tatsunori Hashimoto, Hongseok Namkoong

TL;DR

The paper tackles the problem of achieving uniformly good performance across latent subpopulations when data come from a mixture of covariate distributions. It introduces marginal distributionally robust optimization (DRO) over latent subpopulations, leveraging a dual CVaR representation of worst-case risk and a scalable Lp Hölder variational bound to obtain tractable, nonparametric guarantees. The authors provide finite-sample bounds, convergence rates tied to the Wasserstein distance, and hardness results that illuminate dimension-driven limits. Empirically, the marginal DRO approach yields improved worst-case performance on tasks including semantic similarity, wine quality prediction, and recidivism prediction, while highlighting the computational and dimensional trade-offs. Overall, the work offers a rigorous, scalable framework for robust subpopulation performance that bridges covariate shift, fairness, and causal-inference perspectives.

Abstract

While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.

Distributionally Robust Losses for Latent Covariate Mixtures

TL;DR

Abstract

Distributionally Robust Losses for Latent Covariate Mixtures

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (21)