Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

Zhenyu Wang; Peter Bühlmann; Zijian Guo

Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

Zhenyu Wang, Peter Bühlmann, Zijian Guo

TL;DR

This paper addresses distributional shifts in multi-source unsupervised domain adaptation by introducing Distributionally Robust Learning (DRoL), which uses labeled data from multiple sources and unlabeled target covariates to build robust predictors under a mixture-based uncertainty set over Y|X. The main theoretical result shows the population robust predictor is a weighted average of source conditional means, with aggregation weights obtained by solving a convex quadratic program, and a bias-correction step enhances weight estimation. The authors provide detailed rate results, comparing reward-based robust modeling to squared-error and regret-based alternatives, and demonstrate both computational tractability and privacy-friendly, federated-like properties. Through simulations and a real Beijing PM2.5 dataset, DRoL consistently achieves superior worst-case performance, especially when incorporating informative prior information about the target mixture and applying bias correction. Overall, DRoL offers a principled, scalable, and privacy-conscious approach to robust prediction under covariate shift across multiple sources, with practical impact for domains where target labels are scarce or unavailable.

Abstract

Empirical risk minimization often performs poorly when the distribution of the target domain differs from those of source domains. To address such potential distribution shifts, we develop an unsupervised domain adaptation approach that leverages labeled data from multiple source domains and unlabeled data from the target domain. We introduce a distributionally robust model that optimizes an adversarial reward based on the explained variance across a class of target distributions, ensuring generalization to the target domain. We show that the proposed robust model is a weighted average of conditional outcome models from source domains. This formulation allows us to compute the robust model through the aggregation of source models, which can be estimated using various machine learning algorithms of the users' choice, such as random forests, boosting, and neural networks. Additionally, we introduce a bias-correction step to obtain a more accurate aggregation weight, which is effective for various machine learning algorithms. Our framework can be interpreted as a distributionally robust federated learning approach that satisfies privacy constraints while providing insights into the importance of each source for prediction on the target domain. The performance of our method is evaluated on both simulated and real data.

Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

TL;DR

Abstract

Paper Structure (53 sections, 13 theorems, 242 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 53 sections, 13 theorems, 242 equations, 12 figures, 1 table, 2 algorithms.

Introduction
Our results and contribution
Related works
Notations
Distributionally Robust Prediction Models: Definition and Identification
Group Distributionally Robust Prediction Models
Identification for Distributionally Robust Prediction Models
Exploration of Various Loss Functions for Distributionally Robust Models
Extensions of the Robust Models
Algorithms: Distributionally Robust Learning
Bias Correction: Main Idea
No Covariate Shift Setting
Covariate Shift Setting
Algorithm
Theoretical Justification
...and 38 more sections

Key Result

Theorem 2.1

Suppose that the function class ${\mathcal{F}}$ is convex with $f^{(l)} \in {\mathcal{F}}$ for all $l\in[L]$ and ${\mathcal{H}}$ is a convex subset of $\Delta^L$, then $f^*_{\mathcal{H}}$ defined in eq: f_star H is identified as: where $\Gamma_{k,l} = \mathbb{E}_{{\mathbf{Q}}_X}[f^{(k)}(X)f^{(l)}(X)]$ for $k,l\in [L]$.

Figures (12)

Figure 1: Illustration of the Multi-source Unsupervised Domain Adaptation (MSDA) framework, where source domains have labeled data and the target domain only has unlabeled data.
Figure 2: Illustration of $f^*$ (the blue point) and $f^*_{\mathcal{H}}$ (the red point) for $p=2$, $L=5$, and the additive models $f^{(l)}(x)=\sum_{j=1}^{2}f^{(l)}_{j}(x_j)$ for ${l\in[L]}$. The left panel: $f^*$ is the point closest to the origin in the convex hull of $\{f^{(l)}\}_{l\in[L]}$, and $f^*_{\mathcal{H}}$ is the point in the ${\mathcal{H}}$-constrained set having the smallest distance to the origin; The right panel: consider the setting with shared first component $f^{(1)}_1 = f^{(2)}_1 = ... =f^{(L)}_1 = f_1$ and the second component being scattered around $0$; the distributional robust prediction model $f^*(x)=f_1(x_1)$ (blue point) retains only the shared component and shrinks the sign heterogeneous component to zero.
Figure 3: Illustration of robust prediction models utilizing reward (ours), squared error, and regret, for $L=3$ and ${\mathcal{H}}=\Delta^3$. The left panel: $f^*$ is the point closest to the original within the convex hull of $\{f^{(l)}\}_{l\in[L]}$; The middle panel: $f^{\rm sq}$ corresponds to the source model with the largest noise level with the highest noise level when this noise is substantially higher than that in other sources; The right panel: $f^{\rm reg}$ is the center of the smallest circle enclosing all individual source models.
Figure 4: Comparison of worst-case reward for DRoL, ERM, and ImpWeight, and GroupDRO with the number of source domains $L$ varied across $\{3,...,10\}$. The left panel corresponds to even mixture scheme, where the source domain sample size is $n_l=1000$ for all $l\in [L]$. The right panel stands for the uneven mixture scheme, with $n_1=500\cdot L$ for the first source, and $n_l=500$ for all $l\geq 2$.
Figure 5: Comparison of different methods in terms of the reward evaluated on the target distribution ${\mathbf{Q}}$. The plotted curves represent the mean reward computed over 200 simulation rounds, while the shard error bands indicate the 10th and 90th percentile variability across these rounds. Here, the target conditional outcome model ${\mathbf{Q}}_{Y|X}$ is generated as a mixture of the source conditional outcome models $\{\mathbf{P}^{(l)}_{Y|X}\}_{l\in [4]}$ with mixture weights $\gamma^{\mathbf{Q}} = \left(0.6,\frac{0.4}{3},\frac{0.4}{3},\frac{0.4}{3}\right)^\intercal$. The experiment fixes the number of source domains at $L=4$ (with $n_l=2000$ samples per source) and the unlabeled target sample size at $N=20,\!000$. We vary the number of labeled target samples $N_0 \in \{20,\, 50,\, 100\}$ (which are used only in DRoL-Label and TargetOnly) and the parameter $\rho \in [0, 0.9]$ that controls the size of the constraint set ${\mathcal{H}}$.
...and 7 more figures

Theorems & Definitions (13)

Theorem 2.1
Proposition 2.1
Corollary 2.1
Proposition 2.2
Corollary 2.2
Theorem 4.1
Theorem 4.2
Theorem 4.3
Theorem A.1
Proposition A.1
...and 3 more

Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

TL;DR

Abstract

Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (13)