Table of Contents
Fetching ...

Dual-Teacher Distillation with Subnetwork Rectification for Black-Box Domain Adaptation

Zhe Zhang, Jing Li, Wanli Xue, Xu Cheng, Jianhua Zhang, Qinghua Hu, Shengyong Chen

Abstract

Assuming that neither source data nor the source model is accessible, black box domain adaptation represents a highly practical yet extremely challenging setting, as transferable information is restricted to the predictions of the black box source model, which can only be queried using target samples. Existing approaches attempt to extract transferable knowledge through pseudo label refinement or by leveraging external vision language models (ViLs), but they often suffer from noisy supervision or insufficient utilization of the semantic priors provided by ViLs, which ultimately hinder adaptation performance. To overcome these limitations, we propose a dual teacher distillation with subnetwork rectification (DDSR) model that jointly exploits the specific knowledge embedded in black box source models and the general semantic information of a ViL. DDSR adaptively integrates their complementary predictions to generate reliable pseudo labels for the target domain and introduces a subnetwork driven regularization strategy to mitigate overfitting caused by noisy supervision. Furthermore, the refined target predictions iteratively enhance both the pseudo labels and ViL prompts, enabling more accurate and semantically consistent adaptation. Finally, the target model is further optimized through self training with classwise prototypes. Extensive experiments on multiple benchmark datasets validate the effectiveness of our approach, demonstrating consistent improvements over state of the art methods, including those using source data or models.

Dual-Teacher Distillation with Subnetwork Rectification for Black-Box Domain Adaptation

Abstract

Assuming that neither source data nor the source model is accessible, black box domain adaptation represents a highly practical yet extremely challenging setting, as transferable information is restricted to the predictions of the black box source model, which can only be queried using target samples. Existing approaches attempt to extract transferable knowledge through pseudo label refinement or by leveraging external vision language models (ViLs), but they often suffer from noisy supervision or insufficient utilization of the semantic priors provided by ViLs, which ultimately hinder adaptation performance. To overcome these limitations, we propose a dual teacher distillation with subnetwork rectification (DDSR) model that jointly exploits the specific knowledge embedded in black box source models and the general semantic information of a ViL. DDSR adaptively integrates their complementary predictions to generate reliable pseudo labels for the target domain and introduces a subnetwork driven regularization strategy to mitigate overfitting caused by noisy supervision. Furthermore, the refined target predictions iteratively enhance both the pseudo labels and ViL prompts, enabling more accurate and semantically consistent adaptation. Finally, the target model is further optimized through self training with classwise prototypes. Extensive experiments on multiple benchmark datasets validate the effectiveness of our approach, demonstrating consistent improvements over state of the art methods, including those using source data or models.
Paper Structure (20 sections, 16 equations, 8 figures, 5 tables)

This paper contains 20 sections, 16 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The overview of our proposed DDSR framework. The training process consists of two stages. In the first stage, DDSR adaptively integrates the complementary predictions from the black-box source model and CLIP to generate reliable pseudo-labels for the target domain. A subnetwork-driven regularization strategy is introduced to alleviate overfitting caused by noisy supervision. The target predictions are further employed to iteratively update pseudo-labels through exponential moving average (EMA) and to refine the ViL prompts via the loss $\mathcal{L}_{cm}$. In the second stage, the target model is further optimized through self-training based on class-wise prototypes, leading to more discriminative and semantically consistent feature representations.
  • Figure 2: t-SNE visualizations of target features for D$\rightarrow$A and W$\rightarrow$A on Office-31. Each point represents a target sample in the feature space, with colors indicating different classes. The source model produces scattered distributions with substantial overlaps in (a) and (c), whereas our method generates well-separated clusters in (b) and (d), demonstrating its effectiveness in mitigating domain shift. (Best viewed in color and with magnification.)
  • Figure 3: Training convergence and stability. Accuracy curves on Ar$\rightarrow$Cl and D$\rightarrow$A tasks show rapid performance improvement followed by stable convergence.
  • Figure 4: Effect of the subnetwork ratio $\gamma$ on the Ar$\rightarrow$Rw task of Office-Home and the D$\rightarrow$A task of Office-31. The accuracy reaches its peak when $\gamma=0.84$, while remaining stable across different values, showing the robustness of the method.
  • Figure 5: Sensitivity analysis of hyperparameters $\epsilon$ and $\zeta$ on the Ar$\rightarrow$Cl task of Office-Home. Performance remains stable across a wide range of parameter settings, demonstrating low dependence on hyperparameter tuning.
  • ...and 3 more figures