Table of Contents
Fetching ...

Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization

Amir Hossein Saberi, Amir Najafi, Ala Emrani, Amin Behjati, Yasaman Zolfimoselo, Mahdi Shadrooy, Abolfazl Motahari, Babak H. Khalaj

TL;DR

This paper proposes a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius, and theoretically shows that this method guarantees the classification error across all $P_i$s can be suitably bounded.

Abstract

The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. In particular, we consider a sequence of $T\ge2$ data distributions $P_1,\ldots,P_T$ undergoing a gradual shift, where each pair of consecutive measures $P_i,P_{i+1}$ are close to each other in Wasserstein distance. We have a supervised dataset of size $n$ sampled from $P_0$, while for the subsequent distributions in the sequence, only unlabeled i.i.d. samples are available. Moreover, we assume that all distributions exhibit a known favorable attribute, such as (but not limited to) having intra-class soft/hard margins. In this context, we propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. We theoretically show that this method guarantees the classification error across all $P_i$s can be suitably bounded. Our bounds rely on a newly introduced {\it {compatibility}} measure, which fully characterizes the error propagation dynamics along the sequence. Specifically, for inadequately constrained distributions, the error can exponentially escalate as we progress through the gradual shifts. Conversely, for appropriately constrained distributions, the error can be demonstrated to be linear or even entirely eradicated. We have substantiated our theoretical findings through several experimental results.

Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization

TL;DR

This paper proposes a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius, and theoretically shows that this method guarantees the classification error across all s can be suitably bounded.

Abstract

The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. In particular, we consider a sequence of data distributions undergoing a gradual shift, where each pair of consecutive measures are close to each other in Wasserstein distance. We have a supervised dataset of size sampled from , while for the subsequent distributions in the sequence, only unlabeled i.i.d. samples are available. Moreover, we assume that all distributions exhibit a known favorable attribute, such as (but not limited to) having intra-class soft/hard margins. In this context, we propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. We theoretically show that this method guarantees the classification error across all s can be suitably bounded. Our bounds rely on a newly introduced {\it {compatibility}} measure, which fully characterizes the error propagation dynamics along the sequence. Specifically, for inadequately constrained distributions, the error can exponentially escalate as we progress through the gradual shifts. Conversely, for appropriately constrained distributions, the error can be demonstrated to be linear or even entirely eradicated. We have substantiated our theoretical findings through several experimental results.

Paper Structure

This paper contains 13 sections, 12 theorems, 134 equations, 2 figures, 3 algorithms.

Key Result

Theorem 2.3

For $\lambda>0$ and $p,q\ge 1$, assume classifier set $\mathcal{H}\triangleq\left\{h_{\theta}\vert~\theta\in\Theta\right\}$ and distribution family $\mathcal{G}\subseteq\mathcal{M}\left(\mathcal{Z}\right)$ are compatible according to the Wasserstein metric $\mathcal{W}^q_{p,\lambda}(\cdot,\cdot)$ an where $\bigcirc T$ implies composition of function $u\rightarrow g_{\lambda}\left(2\lambda u+\eta\r

Figures (2)

  • Figure 1: A schematic view of the proposed procedure for our manifold-constrained DRO. A restricted adversarial block, modeled by $f_P$, tries to perturb the source distribution at each step $i$ to prepare the algorithm for the worst possible distribution in step $i+1$. Meanwhile, a classifier $f_C$ tries to learn a classifier based on the perturbed distribution.
  • Figure 2: Comparison of the performance of our proposed method with the GDA kumar2020understanding on rotating MNIST dataset.

Theorems & Definitions (28)

  • Definition 2.1: Restricted Wasserstein Ball
  • Definition 2.2: Compatibility between $\mathcal{G}$ and $\mathcal{H}$
  • Theorem 2.3
  • Corollary 2.4: Elimination of Error Propagation
  • Theorem 3.1
  • Theorem 3.2: Potentials for Error Propagation
  • Theorem 3.3: Non-asymptotic Generalization Guarantee
  • Corollary 3.4: Elimination of Error Propagation in Non-asymptotic Regime
  • Definition 4.1: $\left(C_1, C_2\right)-\text{expansion}$
  • Definition 4.2: $\epsilon-\text{smoothness}$
  • ...and 18 more