Table of Contents
Fetching ...

Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation

Jian Liang, Lijun Sheng, Hongmin Liu, Ran He

TL;DR

This work addresses unsupervised domain adaptation when the source model is accessible only as a black-box API that returns a predicted label and its confidence. It introduces ProDDing, a two-step framework that first distills knowledge from API predictions using adaptive label smoothing and target prototypes, augmented by structural regularizations, and then fine-tunes the distilled model with debiased, semi-supervised learning to mitigate class bias. The approach achieves state-of-the-art performance on standard UDA benchmarks and remains robust under hard-label and label-shift settings, demonstrating practical value for privacy-preserving cross-domain learning. By enabling flexible cross-domain transfer without accessing raw source data, ProDDing offers a scalable solution for real-world deployment where data privacy and API access constraints are critical.

Abstract

Unsupervised domain adaptation aims to transfer knowledge from a related, label-rich source domain to an unlabeled target domain, thereby circumventing the high costs associated with manual annotation. Recently, there has been growing interest in source-free domain adaptation, a paradigm in which only a pre-trained model, rather than the labeled source data, is provided to the target domain. Given the potential risk of source data leakage via model inversion attacks, this paper introduces a novel setting called black-box domain adaptation, where the source model is accessible only through an API that provides the predicted label along with the corresponding confidence value for each query. We develop a two-step framework named $\textbf{Pro}$totypical $\textbf{D}$istillation and $\textbf{D}$ebiased tun$\textbf{ing}$ ($\textbf{ProDDing}$). In the first step, ProDDing leverages both the raw predictions from the source model and prototypes derived from the target domain as teachers to distill a customized target model. In the second step, ProDDing keeps fine-tuning the distilled model by penalizing logits that are biased toward certain classes. Empirical results across multiple benchmarks demonstrate that ProDDing outperforms existing black-box domain adaptation methods. Moreover, in the case of hard-label black-box domain adaptation, where only predicted labels are available, ProDDing achieves significant improvements over these methods. Code will be available at \url{https://github.com/tim-learn/ProDDing/}.

Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation

TL;DR

This work addresses unsupervised domain adaptation when the source model is accessible only as a black-box API that returns a predicted label and its confidence. It introduces ProDDing, a two-step framework that first distills knowledge from API predictions using adaptive label smoothing and target prototypes, augmented by structural regularizations, and then fine-tunes the distilled model with debiased, semi-supervised learning to mitigate class bias. The approach achieves state-of-the-art performance on standard UDA benchmarks and remains robust under hard-label and label-shift settings, demonstrating practical value for privacy-preserving cross-domain learning. By enabling flexible cross-domain transfer without accessing raw source data, ProDDing offers a scalable solution for real-world deployment where data privacy and API access constraints are critical.

Abstract

Unsupervised domain adaptation aims to transfer knowledge from a related, label-rich source domain to an unlabeled target domain, thereby circumventing the high costs associated with manual annotation. Recently, there has been growing interest in source-free domain adaptation, a paradigm in which only a pre-trained model, rather than the labeled source data, is provided to the target domain. Given the potential risk of source data leakage via model inversion attacks, this paper introduces a novel setting called black-box domain adaptation, where the source model is accessible only through an API that provides the predicted label along with the corresponding confidence value for each query. We develop a two-step framework named totypical istillation and ebiased tun (). In the first step, ProDDing leverages both the raw predictions from the source model and prototypes derived from the target domain as teachers to distill a customized target model. In the second step, ProDDing keeps fine-tuning the distilled model by penalizing logits that are biased toward certain classes. Empirical results across multiple benchmarks demonstrate that ProDDing outperforms existing black-box domain adaptation methods. Moreover, in the case of hard-label black-box domain adaptation, where only predicted labels are available, ProDDing achieves significant improvements over these methods. Code will be available at \url{https://github.com/tim-learn/ProDDing/}.
Paper Structure (20 sections, 15 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 15 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: For black-box domain adaptation, the source vendor provides only black-box predictors (e.g., through a cloud API service) to the target user, who possesses certain unlabeled data. During adaptation, only the predicted labels and their associated confidence values are accessible for target queries. When confidence values are unavailable, we refer to this scenario as hard-label black-box domain adaptation.
  • Figure 2: An overview of the proposed ProD, the distillation step of ProDDing, is illustrated. The black-box source predictor (e.g., an API service) is used solely to initialize the memory prediction bank, which stores predictions for each target instance. Building on these predictions, we further employ adaptive label smoothing and prototypical pseudo-labeling to update the memory prediction bank. In the self-distillation process, the memory bank acts as a teacher by maintaining an exponential moving average (EMA) of predictions. Additionally, structural regularizations, capturing batch-wise and pair-wise data structures, are incorporated to enhance adaptation.
  • Figure 3: An overview of the proposed Ding, the fine-tuning step of ProDDing, is illustrated. Built on the network distilled in the first step, we pursue weak-to-strong consistency with a pre-defined threshold over the prediction of the weak augmented sample. In addition to mutual information maximization, we adjust the logits of the strong version to mitigate class bias, where $\pi$ denotes the estimate of the class priors.
  • Figure 4: Accuracies (%) of ProD and ProDDing under different temperature parameter $\tau$ for four representative UDA tasks across three datasets.
  • Figure 5: Accuracies (%) of ProD and ProDDing under different balancing parameter $\beta$ for four representative UDA tasks across three datasets.
  • ...and 3 more figures