Table of Contents
Fetching ...

Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

Zijian Guo, Zhenyu Wang, Yifan Hu, Francis Bach

TL;DR

This work develops Conditional Group Distributionally Robust Optimization (CG-DRO) for multi-source unsupervised domain adaptation with cross-entropy loss, addressing distributional shifts across domains by optimizing over mixtures of source conditional distributions. It introduces a Mirror Prox algorithm augmented with Double Machine Learning to estimate the risk while maintaining high statistical efficiency, and proves fast convergence rates through surrogate minimax problems. Recognizing nonstandard limiting distributions in minimax settings, the authors formulate a perturbation-based inference framework that yields uniformly valid confidence intervals and tests, even when the empirical CG-DRO estimator is nonnormal. Theoretical results are complemented by simulations demonstrating estimation accuracy, nonregular/unstable behavior, and valid uncertainty quantification, with practical implications for robust transfer learning under domain shifts.

Abstract

In multi-source learning with discrete labels, distributional heterogeneity across domains poses a central challenge to developing predictive models that transfer reliably to unseen domains. We study multi-source unsupervised domain adaptation, where labeled data are available from multiple source domains and only unlabeled data are observed from the target domain. To address potential distribution shifts, we propose a novel Conditional Group Distributionally Robust Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from sources domains. We develop an efficient Mirror Prox algorithm for solving the minimax problem and employ a double machine learning procedure to estimate the risk function, ensuring that errors in nuisance estimation contribute only at higher-order rates. We establish fast statistical convergence rates for the empirical CG-DRO estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges. A distinguishing challenge for CG-DRO is the emergence of nonstandard asymptotics: the empirical CG-DRO estimator may fail to converge to a standard limiting distribution due to boundary effects and system instability. To address this, we introduce a perturbation-based inference procedure that enables uniformly valid inference, including confidence interval construction and hypothesis testing.

Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

TL;DR

This work develops Conditional Group Distributionally Robust Optimization (CG-DRO) for multi-source unsupervised domain adaptation with cross-entropy loss, addressing distributional shifts across domains by optimizing over mixtures of source conditional distributions. It introduces a Mirror Prox algorithm augmented with Double Machine Learning to estimate the risk while maintaining high statistical efficiency, and proves fast convergence rates through surrogate minimax problems. Recognizing nonstandard limiting distributions in minimax settings, the authors formulate a perturbation-based inference framework that yields uniformly valid confidence intervals and tests, even when the empirical CG-DRO estimator is nonnormal. Theoretical results are complemented by simulations demonstrating estimation accuracy, nonregular/unstable behavior, and valid uncertainty quantification, with practical implications for robust transfer learning under domain shifts.

Abstract

In multi-source learning with discrete labels, distributional heterogeneity across domains poses a central challenge to developing predictive models that transfer reliably to unseen domains. We study multi-source unsupervised domain adaptation, where labeled data are available from multiple source domains and only unlabeled data are observed from the target domain. To address potential distribution shifts, we propose a novel Conditional Group Distributionally Robust Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from sources domains. We develop an efficient Mirror Prox algorithm for solving the minimax problem and employ a double machine learning procedure to estimate the risk function, ensuring that errors in nuisance estimation contribute only at higher-order rates. We establish fast statistical convergence rates for the empirical CG-DRO estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges. A distinguishing challenge for CG-DRO is the emergence of nonstandard asymptotics: the empirical CG-DRO estimator may fail to converge to a standard limiting distribution due to boundary effects and system instability. To address this, we introduce a perturbation-based inference procedure that enables uniformly valid inference, including confidence interval construction and hypothesis testing.

Paper Structure

This paper contains 95 sections, 38 theorems, 659 equations, 12 figures, 4 algorithms.

Key Result

Proposition 1

Suppose that Conditions cond:A1 and cond:A2 hold. Then there exist positive constants $c_0,c_1>0$ such that with probability at least $1-N^{-c_1 d}-\exp(-c_1 t^2)-\delta_n,$ where $t$ is a positive value satisfying $c_0\leq t\lesssim \sqrt{n/d^3}$ and both $c_0$ and the vanishing sequence $\delta_n \to 0$ are specified in Condition cond:A1.

Figures (12)

  • Figure 1: Illustration of Multi-source Unsupervised Domain Adaptation. The source domains have labeled data, while the target domain only has unlabeled data.
  • Figure 2: Comparison of the worst-case risk defined in \ref{['eq: worst-case loss']} for CG-DRO, Group DRO, and ERM.
  • Figure 3: Illustration of the proof strategy for establishing the fast convergence rate of $\|\widehat{\theta}-\theta^*\|_2$. The following Theorem \ref{['thm: quad convergence']} establishes the approximation errors of $\|\theta^*_{\rm ap} - \theta^*\|_2$ and $\|\widehat{\theta}_{\rm ap} - \widehat{\theta}\|_2$, while the following Theorem \ref{['thm: refined rate']} establishes the convergence rate of $\|\widehat{\theta}_{\rm ap} - \theta^*_{\rm ap}\|_2$.
  • Figure 4: Empirical distributions of $\widehat{\gamma}_1$ and $\widehat{\theta}_1$ in the nonregular, unstable, and regular settings. The top row corresponds to the first source's estimated weight $\widehat{\gamma}_1$, while the bottom row corresponds to the estimated $\widehat{\theta}_1$. Vertical red lines indicate true parameter values, while dashed green lines show the empirical average across $500$ simulations. Blue histograms with overlaid kernel density estimates depict the empirical distributions of estimates across 500 simulations. The exact simulation settings are reported in Supplement Section \ref{['appendix: setups']}.
  • Figure 5: Demonstration of the perturbation idea. We generate a collection of $\widehat{\gamma}^{[m]}$ for $1\leq m\leq M$ and show that there exists one of $\{\widehat{\gamma}^{[m]}\}_{1\leq m\leq M}$ that is nearly equal to $\gamma^*_{\rm ap},$ which is close to $\gamma^*$ by itself.
  • ...and 7 more figures

Theorems & Definitions (40)

  • Proposition 1
  • Remark 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Remark 2: Inference for Minimax Problem
  • Proposition 2
  • Theorem 5
  • Theorem 6
  • ...and 30 more