Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

Andrea Maracani; Raffaello Camoriano; Elisa Maiettini; Davide Talon; Lorenzo Rosasco; Lorenzo Natale

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

Andrea Maracani, Raffaello Camoriano, Elisa Maiettini, Davide Talon, Lorenzo Rosasco, Lorenzo Natale

TL;DR

This work targets the practical and methodological gaps in Source-Free Unsupervised Domain Adaptation (SF-UDA) for image classification by introducing a thorough benchmark framework that treats the adaptation as a double transfer: pre-training (and source fine-tuning) followed by target-domain adaptation with no source data. It critically analyzes a diverse set of SF-UDA methods (SCA, SHOT, NRC, AAD, PCSR) and assesses them across hundreds of backbones, multiple datasets, and both supervised and self-supervised pre-training regimes, highlighting strong correlations between ImageNet performance, pre-training choices, and SF-UDA success. The study reveals that backbone choice and pre-training dataset have a substantial impact on SF-UDA outcomes, that self-supervised pre-training can be competitive but often lags supervised pre-training, and that normalization (LN vs BN) and fine-tuning policies can drastically affect reliability, with LN-based models offering more stable results. The authors release an open-source experimental framework enabling large-scale, reproducible SF-UDA analysis, aiming to reduce dataset- and backbone-specific biases and guide future research toward generalizable, robust SF-UDA methods with practical applicability.

Abstract

This study provides a comprehensive benchmark framework for Source-Free Unsupervised Domain Adaptation (SF-UDA) in image classification, aiming to achieve a rigorous empirical understanding of the complex relationships between multiple key design factors in SF-UDA methods. The study empirically examines a diverse set of SF-UDA techniques, assessing their consistency across datasets, sensitivity to specific hyperparameters, and applicability across different families of backbone architectures. Moreover, it exhaustively evaluates pre-training datasets and strategies, particularly focusing on both supervised and self-supervised methods, as well as the impact of fine-tuning on the source domain. Our analysis also highlights gaps in existing benchmark practices, guiding SF-UDA research towards more effective and general approaches. It emphasizes the importance of backbone architecture and pre-training dataset selection on SF-UDA performance, serving as an essential reference and providing key insights. Lastly, we release the source code of our experimental framework. This facilitates the construction, training, and testing of SF-UDA methods, enabling systematic large-scale experimental analysis and supporting further research efforts in this field.

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 5 figures, 13 tables)

This paper contains 27 sections, 3 equations, 5 figures, 13 tables.

Introduction
Related Work
Methods
SF-UDA pipeline
SF-UDA methods
Simple Class Alignment
Source HypOthesis Transfer
Neighborhood Reciprocity Clustering
Attracting and Dispersing
Polycentric Clustering and Structural Regularization
Benchmarking Framework
Datasets
Backbone selection and integration
Evaluation protocol
Analysis of SF-UDA methods
...and 12 more sections

Figures (5)

Figure 1: SF-UDA pipeline. In this work, we meticulously analyze the SF-UDA pipeline. The process begins pre-training a backbone (along with its classifier, $C_{PT}$) on a large dataset, e.g., ImageNet (see Sec. \ref{['sec:pretraining']}). This is followed by the first transfer phase, where labeled source data is used to refine the backbone and, possibly, train a classifier for the new task (see Sec. \ref{['sec:finetuning']}). Then, the second transfer phase happens, leveraging unlabeled target data to adapt the model in the target domain (see Sec. \ref{['sec:methods']}).
Figure 2: Left to right: LP-IDG accuracy (upper bound) averaged over 23 domains, LP-ODG (lower bound), SCA, and FT-SHOT accuracy (averaged over 74 domain pairs). Each marker indicates an architecture, with the x-axis denoting the ImageNet top1 accuracy. Markers color and shape signify the respective pre-training datasets.
Figure 3: Impact on final accuracy of source fine-tuning. Each marker is a backbone. Variations on the final accuracy is reported as arrows. In the top row, we report the accuracy difference between LP-IDG and FT-IDG (left) and LP-ODG and FT-ODG (right), being our upper and lower bounds, respectively. All other plots represent accuracy difference between LP-ODG with the considered SF-UDA method (i.e., SHOT, FT-SHOT, SCA and FT-SCA). Models with BN and LN are represented with dark and white markers, respectively.
Figure 4: Example of fine-tuning impact for the Modern Office31 dataset (Synthetic $\to$ DSLR in the first row and DSLR $\to$ Synthetic in the second row). Fine-tuning effect on the final accuracy is reported as arrows. In the first column, we report the accuracy variation between LP-ODG and FT-ODG (lower bound). Then, we report the accuracy difference with FT-ODG for the case when ADABN is applied (second column) and for FT-SCA and FT-SHOT (third and fourth columns).
Figure 5: We report the relation between the ImageNet top-1 accuracy and target accuracy of SCA across 74 domain pairs, for over 500 different backbones. Different colors of the markers represent different pre-training dataset, while the shade intensity signifies the SCA accuracy improvement with respect to the lower bound LP-ODG.

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

TL;DR

Abstract

Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (5)