Table of Contents
Fetching ...

BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption

Wenlve Zhou, Zhiheng Zhou, Junyuan Shang, Chang Niu, Mingyue Zhang, Xiyuan Tao, Tianlei Wang

TL;DR

BiPC addresses unsupervised domain adaptation by shifting focus from feature-space alignment to probability-space calibration through Calibrated Probability Alignment (CPA) and Calibrated Gini Impurity (CGI). By enabling bidirectional interaction between a pre-trained head and a task head, BiPC achieves robust cross-domain performance across CNNs and Transformers, including partial-set domain adaptation (PDA). The method delivers state-of-the-art or competitive results on major benchmarks such as Office-Home, Office-31, VisDA-2017, Digits, and ImageCLEF-DA, while maintaining broad compatibility with diverse backbones. This work emphasizes the practicality and effectiveness of exploiting the probability space in UDA, offering a clear path for leveraging pre-trained heads to guide domain adaptation across architectures and tasks.

Abstract

Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: https://github.com/Wenlve-Zhou/BiPC.

BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption

TL;DR

BiPC addresses unsupervised domain adaptation by shifting focus from feature-space alignment to probability-space calibration through Calibrated Probability Alignment (CPA) and Calibrated Gini Impurity (CGI). By enabling bidirectional interaction between a pre-trained head and a task head, BiPC achieves robust cross-domain performance across CNNs and Transformers, including partial-set domain adaptation (PDA). The method delivers state-of-the-art or competitive results on major benchmarks such as Office-Home, Office-31, VisDA-2017, Digits, and ImageCLEF-DA, while maintaining broad compatibility with diverse backbones. This work emphasizes the practicality and effectiveness of exploiting the probability space in UDA, offering a clear path for leveraging pre-trained heads to guide domain adaptation across architectures and tasks.

Abstract

Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: https://github.com/Wenlve-Zhou/BiPC.
Paper Structure (17 sections, 20 equations, 5 figures, 11 tables, 1 algorithm)

This paper contains 17 sections, 20 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: $A_L$-distance on feature space and probability space of different architectures between Art and Clipart from Office-Home based on ImageNet pre-trained models. The feature space refers to the distribution before the pre-trained classifier and the probability space represents its output. It can be seen that the probability space of the pre-trained model has a smaller domain gap.
  • Figure 2: Pipeline of the previous methods and BiPC. (a) This paradigm typically involves compressing the feature space using a bottleneck layer to align the source and target domains through an alignment loss function, such as MMD (ref4). (b) Transformer-based UDA. CDTrans (ref14) serves as an example of this paradigm, which achieves invariant feature learning through a specifically designed cross-domain attention mechanism. However, it is only applicable to plain Transformers, like ViT (ref12). (c) The proposed BiPC. This approach combines effectiveness and flexibility and can be adapted to different architectures, as described in Section 3.
  • Figure 3: Convergence and performance of feature space vs probability space on task Art→Clipart (Office-Home). (a) Target domain accuracy with ResNet-50. (b) Target domain loss with ResNet-50. (c) Target domain accuracy with DeiT-Base. (d) Target domain loss with DeiT-Base.
  • Figure 4: On task Art→Clipart (Office-Home), we further analyze the convergence of BiPC compared with SoTA. (a) and (b) are the convergence about ResNet-50 and DeiT-Base relevant methods respectively.
  • Figure 5: Average accuracy (%) of BiPC for all partial-set DA tasks on Office-Home (ResNet-50).