Table of Contents
Fetching ...

Stein Discrepancy for Unsupervised Domain Adaptation

Anneke von Seeger, Dongmian Zou, Gilad Lerman

TL;DR

This work introduces Stein discrepancy-based unsupervised domain adaptation (UDA) to address scenarios with scarce unlabeled target data. It develops both kernelized and adversarial forms of the transfer loss and supports target-density modeling via Gaussian, Gaussian mixtures, or VAEs, enabling score-function-based alignment without relying on abundant target labels. Theoretical contributions include a generalization bound on the target error and a convergence rate for the empirical Stein discrepancy in two-sample settings. Empirically, the method yields robust improvements over baselines in scarce-target benchmarks (Office31, Office-Home, VisDA-2017), with notable gains when combined with FixMatch or SPA, especially under very limited target data.

Abstract

Unsupervised domain adaptation (UDA) aims to improve model performance on an unlabeled target domain using a related, labeled source domain. A common approach aligns source and target feature distributions by minimizing a distance between them, often using symmetric measures such as maximum mean discrepancy (MMD). However, these methods struggle when target data is scarce. We propose a novel UDA framework that leverages Stein discrepancy, an asymmetric measure that depends on the target distribution only through its score function, making it particularly suitable for low-data target regimes. Our proposed method has kernelized and adversarial forms and supports flexible modeling of the target distribution via Gaussian, GMM, or VAE models. We derive a generalization bound on the target error and a convergence rate for the empirical Stein discrepancy in the two-sample setting. Empirically, our method consistently outperforms prior UDA approaches under limited target data across multiple benchmarks.

Stein Discrepancy for Unsupervised Domain Adaptation

TL;DR

This work introduces Stein discrepancy-based unsupervised domain adaptation (UDA) to address scenarios with scarce unlabeled target data. It develops both kernelized and adversarial forms of the transfer loss and supports target-density modeling via Gaussian, Gaussian mixtures, or VAEs, enabling score-function-based alignment without relying on abundant target labels. Theoretical contributions include a generalization bound on the target error and a convergence rate for the empirical Stein discrepancy in two-sample settings. Empirically, the method yields robust improvements over baselines in scarce-target benchmarks (Office31, Office-Home, VisDA-2017), with notable gains when combined with FixMatch or SPA, especially under very limited target data.

Abstract

Unsupervised domain adaptation (UDA) aims to improve model performance on an unlabeled target domain using a related, labeled source domain. A common approach aligns source and target feature distributions by minimizing a distance between them, often using symmetric measures such as maximum mean discrepancy (MMD). However, these methods struggle when target data is scarce. We propose a novel UDA framework that leverages Stein discrepancy, an asymmetric measure that depends on the target distribution only through its score function, making it particularly suitable for low-data target regimes. Our proposed method has kernelized and adversarial forms and supports flexible modeling of the target distribution via Gaussian, GMM, or VAE models. We derive a generalization bound on the target error and a convergence rate for the empirical Stein discrepancy in the two-sample setting. Empirically, our method consistently outperforms prior UDA approaches under limited target data across multiple benchmarks.

Paper Structure

This paper contains 32 sections, 2 theorems, 27 equations, 28 figures, 12 tables.

Key Result

Theorem 1

Let ${\mathcal{D}}_S, {\mathcal{D}}_T$ be probability distributions on the feature space $X$ and ${\mathcal{F}}$ be the unit ball of an rkhs with kernel $k(x,x')$, with $x, x' \in X$. Let $f^*_S$ and $f^*_T$ denote the true labeling functions associated with the source and target distributions, resp where ${\textnormal{S}}(\cdot,\cdot)$ is the Stein discrepancy, and $C$ depends on ${\mathcal{F}}$

Figures (28)

  • Figure 1: Architecture for Stein discrepancy-based uda. Source and target data, $x_S, x_T$ pass through a feature extractor $g$. Source features $z_S$ classified by $c$ and classification loss ${\mathcal{L}}_{\text{C}}$ is calculated. Target features $z_T$ are used to estimate a target distribution; the score function is $\nabla \log {\mathcal{D}}_T$ is used in the Stein operator ${\mathcal{A}}_{{\mathcal{D}}_T}$. Top (kernelized architecture): ${\mathcal{L}}_{\text{D}}$ is defined according to Eq. (\ref{['eq-kernelSteinDisc']}): ${\mathcal{L}}_{\text{D}}= \mathbb{E}_{z_S} [ {\mathcal{A}}_{{\mathcal{D}}_T} {\mathcal{A}}_{{\mathcal{D}}_T} k(z_S, z_S')]$. Training minimizes ${\mathcal{L}}_{\text{C}} + \lambda {\mathcal{L}}_{\text{D}}$ over $g, c$, where $\lambda$ is a trade-off parameter between the two losses. Bottom (adversarial architecture): ${\mathcal{L}}_{\text{D}}$ is defined according to Eq. (\ref{['eq-steinDisc']}): ${\mathcal{L}}_{\text{D}}= \max_{f \in {\mathcal{F}}} \mathbb{E}_{z_S} [ {\mathcal{A}}_{{\mathcal{D}}_T} f(z_S)]$. Training maximizes over $f$ to estimate ${\mathcal{L}}_{\text{D}}$ and minimizes ${\mathcal{L}}_{\text{C}} + \lambda {\mathcal{L}}_{\text{D}}$ over $g,c$.
  • Figure 2: Accuracy on Office31, averaged across six domain pairs. Orange bars use all target data; blue bars use at most 1% (or 32) target examples.
  • Figure 3: Accuracy on Office-Home, averaged across domain pairs.
  • Figure 4: Accuracy on VisDA-2017.
  • Figure 5: Accuracy vs. target data percentage on Office31 (log-scale). Stein discrepancy methods (solid lines) are more stable under data scarcity. FixMatch combined with SD-SD-AGMM performs best at low target availability.
  • ...and 23 more figures

Theorems & Definitions (5)

  • Definition 1: Stein discrepancy
  • Theorem 1
  • proof
  • Theorem 2
  • proof