Table of Contents
Fetching ...

ElimPCL: Eliminating Noise Accumulation with Progressive Curriculum Labeling for Source-Free Domain Adaptation

Jie Cheng, Hao Zheng, Meiguang Zheng, Lei Wang, Hao Wu, Jian Zhang

TL;DR

Source-free domain adaptation often struggles with noise accumulation from uncertain pseudo-labels on hard target samples, causing propagation of errors during learning. ElimPCL tackles this by a prototype-consistency curriculum that filters trustworthy pseudo-labels before adaptation and a Dual MixUP strategy in feature space to mitigate noise spread, followed by adaptive co-training with an ImageNet-pretrained backbone. Across multiple benchmarks, ElimPCL yields consistent improvements, especially under severe domain shifts, and ablation confirms the crucial roles of prototype filtering and Dual MixUP in improving hard-sample separability. The approach offers a practical, privacy-preserving pathway to robust SFDA with potential wider applicability to real-world domain adaptation scenarios.

Abstract

Source-Free Domain Adaptation (SFDA) aims to train a target model without source data, and the key is to generate pseudo-labels using a pre-trained source model. However, we observe that the source model often produces highly uncertain pseudo-labels for hard samples, particularly those heavily affected by domain shifts, leading to these noisy pseudo-labels being introduced even before adaptation and further reinforced through parameter updates. Additionally, they continuously influence neighbor samples through propagation in the feature space.To eliminate the issue of noise accumulation, we propose a novel Progressive Curriculum Labeling (ElimPCL) method, which iteratively filters trustworthy pseudo-labeled samples based on prototype consistency to exclude high-noise samples from training. Furthermore, a Dual MixUP technique is designed in the feature space to enhance the separability of hard samples, thereby mitigating the interference of noisy samples on their neighbors.Extensive experiments validate the effectiveness of ElimPCL, achieving up to a 3.4% improvement on challenging tasks compared to state-of-the-art methods.

ElimPCL: Eliminating Noise Accumulation with Progressive Curriculum Labeling for Source-Free Domain Adaptation

TL;DR

Source-free domain adaptation often struggles with noise accumulation from uncertain pseudo-labels on hard target samples, causing propagation of errors during learning. ElimPCL tackles this by a prototype-consistency curriculum that filters trustworthy pseudo-labels before adaptation and a Dual MixUP strategy in feature space to mitigate noise spread, followed by adaptive co-training with an ImageNet-pretrained backbone. Across multiple benchmarks, ElimPCL yields consistent improvements, especially under severe domain shifts, and ablation confirms the crucial roles of prototype filtering and Dual MixUP in improving hard-sample separability. The approach offers a practical, privacy-preserving pathway to robust SFDA with potential wider applicability to real-world domain adaptation scenarios.

Abstract

Source-Free Domain Adaptation (SFDA) aims to train a target model without source data, and the key is to generate pseudo-labels using a pre-trained source model. However, we observe that the source model often produces highly uncertain pseudo-labels for hard samples, particularly those heavily affected by domain shifts, leading to these noisy pseudo-labels being introduced even before adaptation and further reinforced through parameter updates. Additionally, they continuously influence neighbor samples through propagation in the feature space.To eliminate the issue of noise accumulation, we propose a novel Progressive Curriculum Labeling (ElimPCL) method, which iteratively filters trustworthy pseudo-labeled samples based on prototype consistency to exclude high-noise samples from training. Furthermore, a Dual MixUP technique is designed in the feature space to enhance the separability of hard samples, thereby mitigating the interference of noisy samples on their neighbors.Extensive experiments validate the effectiveness of ElimPCL, achieving up to a 3.4% improvement on challenging tasks compared to state-of-the-art methods.

Paper Structure

This paper contains 12 sections, 15 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: An illustration of the noise accumulation issue on Office-Caltech Caltech$\rightarrow$Amazon. A few high-noise samples have been extremely misaligned before adaptation due to heavy domain shifts. Additionally, they induce neighbor samples to be misaligned as well during domain adaptation. This leads to accumulating a large number of noisy samples.
  • Figure 2: Overview of ElimPCL. The pseudo-labels generated by the source model are first fed into the prototype consistency module to divide the target domain into a trustworthy and untrustworthy subset. The trustworthy subset is used as the curriculum to guide a student model training, excluding interference from high-noise samples. Then, these two sets of samples are mixed with features and curriculum-labels simultaneously using Dual MixUP to facilitate feature learning for hard samples. Finally, the source model is fine-tuned via fusing the student model parameters by co-learning with ImageNet pre-trained network.
  • Figure 3: Accuracy and percentage of the trustworthy subset $|D_{tt}|$ within the entire target domain $|D_{t}|$ on VisDA-C.
  • Figure 4: Per-class accuracy on VisDA-C.
  • Figure 5: Feature visualization with ElimPCL.