Key Design Choices in Source-Free Unsupervised Domain Adaptation: An In-depth Empirical Analysis
Andrea Maracani, Raffaello Camoriano, Elisa Maiettini, Davide Talon, Lorenzo Rosasco, Lorenzo Natale
TL;DR
This work targets the practical and methodological gaps in Source-Free Unsupervised Domain Adaptation (SF-UDA) for image classification by introducing a thorough benchmark framework that treats the adaptation as a double transfer: pre-training (and source fine-tuning) followed by target-domain adaptation with no source data. It critically analyzes a diverse set of SF-UDA methods (SCA, SHOT, NRC, AAD, PCSR) and assesses them across hundreds of backbones, multiple datasets, and both supervised and self-supervised pre-training regimes, highlighting strong correlations between ImageNet performance, pre-training choices, and SF-UDA success. The study reveals that backbone choice and pre-training dataset have a substantial impact on SF-UDA outcomes, that self-supervised pre-training can be competitive but often lags supervised pre-training, and that normalization (LN vs BN) and fine-tuning policies can drastically affect reliability, with LN-based models offering more stable results. The authors release an open-source experimental framework enabling large-scale, reproducible SF-UDA analysis, aiming to reduce dataset- and backbone-specific biases and guide future research toward generalizable, robust SF-UDA methods with practical applicability.
Abstract
This study provides a comprehensive benchmark framework for Source-Free Unsupervised Domain Adaptation (SF-UDA) in image classification, aiming to achieve a rigorous empirical understanding of the complex relationships between multiple key design factors in SF-UDA methods. The study empirically examines a diverse set of SF-UDA techniques, assessing their consistency across datasets, sensitivity to specific hyperparameters, and applicability across different families of backbone architectures. Moreover, it exhaustively evaluates pre-training datasets and strategies, particularly focusing on both supervised and self-supervised methods, as well as the impact of fine-tuning on the source domain. Our analysis also highlights gaps in existing benchmark practices, guiding SF-UDA research towards more effective and general approaches. It emphasizes the importance of backbone architecture and pre-training dataset selection on SF-UDA performance, serving as an essential reference and providing key insights. Lastly, we release the source code of our experimental framework. This facilitates the construction, training, and testing of SF-UDA methods, enabling systematic large-scale experimental analysis and supporting further research efforts in this field.
