Table of Contents
Fetching ...

Proxy Denoising for Source-Free Domain Adaptation

Song Tang, Wenxin Su, Yan Gan, Mao Ye, Jianwei Zhang, Xiatian Zhu

TL;DR

This paper tackles SFDA under strict data privacy by addressing noisy ViL-based supervision. It introduces Proxy Denoising (ProDe), which treats ViL space as a noisy proxy to the latent domain-invariant space and grounds adaptation in a Proxy Confidence Theory that quantifies the reliability of ViL predictions during training. ProDe comprises a proxy denoising module that refines ViL logits and a mutual knowledge distillation objective that leverages these refined predictions to learn a robust target model. Comprehensive experiments across Office-31, Office-Home, VisDA, and DomainNet-126 show that ProDe achieves state-of-the-art performance across closed-set, open-set, partial-set, generalized SFDA, as well as multi-target, multi-source, and test-time adaptation settings. The work provides a practical framework for integrating ViL models into SFDA while controlling for supervision noise, with open-source code available at the project repository.

Abstract

Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of large Vision-Language (ViL) models in many applications, the latest research has validated ViL's benefit for SFDA by using their predictions as pseudo supervision. However, we observe that ViL's supervision could be noisy and inaccurate at an unknown rate, introducing additional negative effects during adaption. To address this thus-far ignored challenge, we introduce a novel Proxy Denoising (ProDe) approach. The key idea is to leverage the ViL model as a proxy to facilitate the adaptation process towards the latent domain-invariant space. We design a proxy denoising mechanism to correct ViL's predictions, grounded on a proxy confidence theory that models the dynamic effect of proxy's divergence against the domain-invariant space during adaptation. To capitalize on the corrected proxy, we derive a mutual knowledge distilling regularization. Extensive experiments show that ProDe significantly outperforms current state-of-the-art alternatives under the conventional closed set setting and more challenging open set, partial set, generalized SFDA, multi-target, multi-source, and test-time settings. Our code and data are available at https://github.com/tntek/source-free-domain-adaptation.

Proxy Denoising for Source-Free Domain Adaptation

TL;DR

This paper tackles SFDA under strict data privacy by addressing noisy ViL-based supervision. It introduces Proxy Denoising (ProDe), which treats ViL space as a noisy proxy to the latent domain-invariant space and grounds adaptation in a Proxy Confidence Theory that quantifies the reliability of ViL predictions during training. ProDe comprises a proxy denoising module that refines ViL logits and a mutual knowledge distillation objective that leverages these refined predictions to learn a robust target model. Comprehensive experiments across Office-31, Office-Home, VisDA, and DomainNet-126 show that ProDe achieves state-of-the-art performance across closed-set, open-set, partial-set, generalized SFDA, as well as multi-target, multi-source, and test-time adaptation settings. The work provides a practical framework for integrating ViL models into SFDA while controlling for supervision noise, with open-source code available at the project repository.

Abstract

Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of large Vision-Language (ViL) models in many applications, the latest research has validated ViL's benefit for SFDA by using their predictions as pseudo supervision. However, we observe that ViL's supervision could be noisy and inaccurate at an unknown rate, introducing additional negative effects during adaption. To address this thus-far ignored challenge, we introduce a novel Proxy Denoising (ProDe) approach. The key idea is to leverage the ViL model as a proxy to facilitate the adaptation process towards the latent domain-invariant space. We design a proxy denoising mechanism to correct ViL's predictions, grounded on a proxy confidence theory that models the dynamic effect of proxy's divergence against the domain-invariant space during adaptation. To capitalize on the corrected proxy, we derive a mutual knowledge distilling regularization. Extensive experiments show that ProDe significantly outperforms current state-of-the-art alternatives under the conventional closed set setting and more challenging open set, partial set, generalized SFDA, multi-target, multi-source, and test-time settings. Our code and data are available at https://github.com/tntek/source-free-domain-adaptation.
Paper Structure (26 sections, 1 theorem, 11 equations, 7 figures, 21 tables, 1 algorithm)

This paper contains 26 sections, 1 theorem, 11 equations, 7 figures, 21 tables, 1 algorithm.

Key Result

Theorem 1

We note that the source domain ($D_{\mathcal{S}}$), the domain-invariant space ($D_{\mathcal{I}}$), the proxy space ($D_{\mathcal{V}}$) and the in-training model ($D_{\mathcal{T}}^t$) follow the probability distributions $P(\mathcal{S})$, $P(\mathcal{I})$, $P(\mathcal{V})$ and $P(\mathcal{T}^t)$, re

Figures (7)

  • Figure 1: Conceptual illustration of ProDe. We align the adapting direction with the desired trajectory by leveraging a proxy space that approximates the latent domain-invariant space. This process incorporates direction adjustments based on proxy error correction, implementing proxy denoising, and finally achieves enhanced model adaptation.
  • Figure 2: Left: Dynamics of effect of ViL model’s prediction error (or proxy error) during alignment. (a) In the initial adaptation phase, it is acceptable to overlook the proxy errors. However, as the in-training model approaches the proxy space, these errors grow to be more noticeable, leading to continuous decline in the reliability of ViL predictions as shown in (b) and (c). Right: Our ProDe capitalizes on the corrected proxy, involving a mutual knowledge distilling regularization and a proxy denoising mechanism imposing refinement on the ViL logits.
  • Figure 3: Feature visualization comparison in 3D density charts.
  • Figure 4: Ablation study results (%) on Office-31, Office-Home and VisDA.
  • Figure 5: Comparison results (%) on Office-31, Office-Home and VisDA as image encoder backbone in CLIP adopts architecture ViT-B/16. SF means source-free.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proof 1