Revisiting CLIP for SF-OSDA: Unleashing Zero-Shot Potential with Adaptive Threshold and Training-Free Feature Filtering
Yongguang Li, Jindong Li, Qi Wang, Qianli Xing, Runliang Niu, Shengsheng Wang, Menglin Yang
TL;DR
This work addresses Source-Free Unsupervised Open-Set Domain Adaptation (SF-OSDA) with CLIP by tackling two core problems: dependence on fixed, domain-specific thresholds and unnecessary training-costs that can shift CLIP features. It introduces CLIPXpert, a training-free, source-free framework that combines Box-Cox GMM-Based Adaptive Thresholding (BGAT) to derive a robust $T^*$ from score distributions and SVD-Based Unknown-Class Feature Filtering (SUFF) to suppress unknown-class bias in the feature space. BGAT dynamically models score distributions and derives $T^*$ via the intersection of Gaussian PDFs after a Box-Cox transformation, while SUFF reconstructs feature spaces using principal components to separate known and unknown classes without additional training. Across Office-Home, VisDA-2017, DomainNet, and VATB benchmarks, CLIPXpert achieves competitive or state-of-the-art results with notable gains over fixed-threshold baselines and other CLIP-based methods, underscoring CLIP's strong zero-shot potential for SF-OSDA in resource-constrained settings.
Abstract
Source-Free Unsupervised Open-Set Domain Adaptation (SF-OSDA) methods using CLIP face significant issues: (1) while heavily dependent on domain-specific threshold selection, existing methods employ simple fixed thresholds, underutilizing CLIP's zero-shot potential in SF-OSDA scenarios; and (2) overlook intrinsic class tendencies while employing complex training to enforce feature separation, incurring deployment costs and feature shifts that compromise CLIP's generalization ability. To address these issues, we propose CLIPXpert, a novel SF-OSDA approach that integrates two key components: an adaptive thresholding strategy and an unknown class feature filtering module. Specifically, the Box-Cox GMM-Based Adaptive Thresholding (BGAT) module dynamically determines the optimal threshold by estimating sample score distributions, balancing known class recognition and unknown class sample detection. Additionally, the Singular Value Decomposition (SVD)-Based Unknown-Class Feature Filtering (SUFF) module reduces the tendency of unknown class samples towards known classes, improving the separation between known and unknown classes. Experiments show that our source-free and training-free method outperforms state-of-the-art trained approach UOTA by 1.92% on the DomainNet dataset, achieves SOTA-comparable performance on datasets such as Office-Home, and surpasses other SF-OSDA methods. This not only validates the effectiveness of our proposed method but also highlights CLIP's strong zero-shot potential for SF-OSDA tasks.
