Table of Contents
Fetching ...

Continual Domain Adversarial Adaptation via Double-Head Discriminators

Yan Shen, Zhanghexuan Ji, Chunwei Ma, Mingchen Gao

TL;DR

The paper tackles continual unsupervised domain adaptation with limited access to past source data, where estimating the ${\mathcal{H}}$-divergence is unreliable when using small memory buffers. It introduces a double-head discriminator comprising a pre-trained, frozen source-only discriminator and a target-phase discriminator, whose ensemble provides a more accurate domain-discrepancy signal to learn domain-invariant features. Theoretical analysis links the ensemble to a population ${\mathcal{H}}-\Delta{\mathcal{H}}$ bound and offers finite-sample generalization guarantees, while experiments on benchmarks such as MNIST-family and Office datasets show consistent improvements in target adaptation and reduced forgetting, with further gains achievable by combining with SSL or KD. Overall, the method advances continual UDA by reducing empirical estimation error and enhancing robustness to domain shifts under memory constraints, with potential for extension to source-free settings.

Abstract

Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation arises from the erroneous empirical estimation of $\gH$-divergence with few source domain samples. To tackle this problem, we propose a double-head discriminator algorithm, by introducing an addition source-only domain discriminator that are trained solely on source learning phase. We prove that with the introduction of a pre-trained source-only domain discriminator, the empirical estimation error of $\gH$-divergence related adversarial loss is reduced from the source domain side. Further experiments on existing domain adaptation benchmark show that our proposed algorithm achieves more than 2$\%$ improvement on all categories of target domain adaptation task while significantly mitigating the forgetting on source domain.

Continual Domain Adversarial Adaptation via Double-Head Discriminators

TL;DR

The paper tackles continual unsupervised domain adaptation with limited access to past source data, where estimating the -divergence is unreliable when using small memory buffers. It introduces a double-head discriminator comprising a pre-trained, frozen source-only discriminator and a target-phase discriminator, whose ensemble provides a more accurate domain-discrepancy signal to learn domain-invariant features. Theoretical analysis links the ensemble to a population bound and offers finite-sample generalization guarantees, while experiments on benchmarks such as MNIST-family and Office datasets show consistent improvements in target adaptation and reduced forgetting, with further gains achievable by combining with SSL or KD. Overall, the method advances continual UDA by reducing empirical estimation error and enhancing robustness to domain shifts under memory constraints, with potential for extension to source-free settings.

Abstract

Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation arises from the erroneous empirical estimation of -divergence with few source domain samples. To tackle this problem, we propose a double-head discriminator algorithm, by introducing an addition source-only domain discriminator that are trained solely on source learning phase. We prove that with the introduction of a pre-trained source-only domain discriminator, the empirical estimation error of -divergence related adversarial loss is reduced from the source domain side. Further experiments on existing domain adaptation benchmark show that our proposed algorithm achieves more than 2 improvement on all categories of target domain adaptation task while significantly mitigating the forgetting on source domain.
Paper Structure (20 sections, 11 theorems, 69 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 11 theorems, 69 equations, 5 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let ${\mathcal{F}}$ be a hypothesis space with VC dimensions $d$, if $S'$ are samples of size $m$ from $S$ and $T'$ are samples of size $n$ from $T$ respectively and $\hat{d}_{{\mathcal{H}}\Delta{\mathcal{H}}}(S', T')$ is the empirical ${\mathcal{H}}$-divergence between samples, then for any $\delta

Figures (5)

  • Figure 1: The flowchart of our proposed double-head discriminator algorithm. The solid line is the forward path. And the dashed line is the backward training path. After the task model is trained in source domain, an additional source-only domain discriminator $h_{\psi,s}$ is trained by freezing the task model $f_{\omega}$. In the target adaptation phase, $h_{\psi,t}$ is adversarially trained with $f_{\omega}^1$ on domain adversarial loss, where the ensembles of domain discriminator $h_{\psi,s}$ and $h_{\psi,t}$'s digit is used as domain adversarial signal to learn domain invariant features for $f_{\omega}^1$
  • Figure 1: A continual adversarial domain adaptation model. Only the source risk of the client's local source data is accessible in source only training phase. A small set of buffered source domain data and target domain data is adversarial trained in target adaptation phase.
  • Figure 2: Effect of different memory size on model performance
  • Figure 3: Effect of source only domain discriminator's contribution on target adaptation performance
  • Figure 4: Effect of Source-only Domain Discriminator's learning rate $l_r$ and training epochs $t_2$ on target adaptation performance.

Theorems & Definitions (16)

  • Theorem 1
  • Definition 3.1: Margin Disparity Discrepancy zhang2019bridging
  • Theorem 2
  • Definition 4.1
  • Theorem 3
  • Proposition 1
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • proof
  • ...and 6 more