Table of Contents
Fetching ...

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

Andrea Maracani, Raffaello Camoriano, Elisa Maiettini, Davide Talon, Lorenzo Rosasco, Lorenzo Natale

TL;DR

This work targets the practical and methodological gaps in Source-Free Unsupervised Domain Adaptation (SF-UDA) for image classification by introducing a thorough benchmark framework that treats the adaptation as a double transfer: pre-training (and source fine-tuning) followed by target-domain adaptation with no source data. It critically analyzes a diverse set of SF-UDA methods (SCA, SHOT, NRC, AAD, PCSR) and assesses them across hundreds of backbones, multiple datasets, and both supervised and self-supervised pre-training regimes, highlighting strong correlations between ImageNet performance, pre-training choices, and SF-UDA success. The study reveals that backbone choice and pre-training dataset have a substantial impact on SF-UDA outcomes, that self-supervised pre-training can be competitive but often lags supervised pre-training, and that normalization (LN vs BN) and fine-tuning policies can drastically affect reliability, with LN-based models offering more stable results. The authors release an open-source experimental framework enabling large-scale, reproducible SF-UDA analysis, aiming to reduce dataset- and backbone-specific biases and guide future research toward generalizable, robust SF-UDA methods with practical applicability.

Abstract

This study provides a comprehensive benchmark framework for Source-Free Unsupervised Domain Adaptation (SF-UDA) in image classification, aiming to achieve a rigorous empirical understanding of the complex relationships between multiple key design factors in SF-UDA methods. The study empirically examines a diverse set of SF-UDA techniques, assessing their consistency across datasets, sensitivity to specific hyperparameters, and applicability across different families of backbone architectures. Moreover, it exhaustively evaluates pre-training datasets and strategies, particularly focusing on both supervised and self-supervised methods, as well as the impact of fine-tuning on the source domain. Our analysis also highlights gaps in existing benchmark practices, guiding SF-UDA research towards more effective and general approaches. It emphasizes the importance of backbone architecture and pre-training dataset selection on SF-UDA performance, serving as an essential reference and providing key insights. Lastly, we release the source code of our experimental framework. This facilitates the construction, training, and testing of SF-UDA methods, enabling systematic large-scale experimental analysis and supporting further research efforts in this field.

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

TL;DR

This work targets the practical and methodological gaps in Source-Free Unsupervised Domain Adaptation (SF-UDA) for image classification by introducing a thorough benchmark framework that treats the adaptation as a double transfer: pre-training (and source fine-tuning) followed by target-domain adaptation with no source data. It critically analyzes a diverse set of SF-UDA methods (SCA, SHOT, NRC, AAD, PCSR) and assesses them across hundreds of backbones, multiple datasets, and both supervised and self-supervised pre-training regimes, highlighting strong correlations between ImageNet performance, pre-training choices, and SF-UDA success. The study reveals that backbone choice and pre-training dataset have a substantial impact on SF-UDA outcomes, that self-supervised pre-training can be competitive but often lags supervised pre-training, and that normalization (LN vs BN) and fine-tuning policies can drastically affect reliability, with LN-based models offering more stable results. The authors release an open-source experimental framework enabling large-scale, reproducible SF-UDA analysis, aiming to reduce dataset- and backbone-specific biases and guide future research toward generalizable, robust SF-UDA methods with practical applicability.

Abstract

This study provides a comprehensive benchmark framework for Source-Free Unsupervised Domain Adaptation (SF-UDA) in image classification, aiming to achieve a rigorous empirical understanding of the complex relationships between multiple key design factors in SF-UDA methods. The study empirically examines a diverse set of SF-UDA techniques, assessing their consistency across datasets, sensitivity to specific hyperparameters, and applicability across different families of backbone architectures. Moreover, it exhaustively evaluates pre-training datasets and strategies, particularly focusing on both supervised and self-supervised methods, as well as the impact of fine-tuning on the source domain. Our analysis also highlights gaps in existing benchmark practices, guiding SF-UDA research towards more effective and general approaches. It emphasizes the importance of backbone architecture and pre-training dataset selection on SF-UDA performance, serving as an essential reference and providing key insights. Lastly, we release the source code of our experimental framework. This facilitates the construction, training, and testing of SF-UDA methods, enabling systematic large-scale experimental analysis and supporting further research efforts in this field.
Paper Structure (27 sections, 3 equations, 5 figures, 13 tables)

This paper contains 27 sections, 3 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: SF-UDA pipeline. In this work, we meticulously analyze the SF-UDA pipeline. The process begins pre-training a backbone (along with its classifier, $C_{PT}$) on a large dataset, e.g., ImageNet (see Sec. \ref{['sec:pretraining']}). This is followed by the first transfer phase, where labeled source data is used to refine the backbone and, possibly, train a classifier for the new task (see Sec. \ref{['sec:finetuning']}). Then, the second transfer phase happens, leveraging unlabeled target data to adapt the model in the target domain (see Sec. \ref{['sec:methods']}).
  • Figure 2: Left to right: LP-IDG accuracy (upper bound) averaged over 23 domains, LP-ODG (lower bound), SCA, and FT-SHOT accuracy (averaged over 74 domain pairs). Each marker indicates an architecture, with the x-axis denoting the ImageNet top1 accuracy. Markers color and shape signify the respective pre-training datasets.
  • Figure 3: Impact on final accuracy of source fine-tuning. Each marker is a backbone. Variations on the final accuracy is reported as arrows. In the top row, we report the accuracy difference between LP-IDG and FT-IDG (left) and LP-ODG and FT-ODG (right), being our upper and lower bounds, respectively. All other plots represent accuracy difference between LP-ODG with the considered SF-UDA method (i.e., SHOT, FT-SHOT, SCA and FT-SCA). Models with BN and LN are represented with dark and white markers, respectively.
  • Figure 4: Example of fine-tuning impact for the Modern Office31 dataset (Synthetic $\to$ DSLR in the first row and DSLR $\to$ Synthetic in the second row). Fine-tuning effect on the final accuracy is reported as arrows. In the first column, we report the accuracy variation between LP-ODG and FT-ODG (lower bound). Then, we report the accuracy difference with FT-ODG for the case when ADABN is applied (second column) and for FT-SCA and FT-SHOT (third and fourth columns).
  • Figure 5: We report the relation between the ImageNet top-1 accuracy and target accuracy of SCA across 74 domain pairs, for over 500 different backbones. Different colors of the markers represent different pre-training dataset, while the shade intensity signifies the SCA accuracy improvement with respect to the lower bound LP-ODG.