Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
TL;DR
This work tackles cross-domain few-shot segmentation by abandoning training on a source domain and instead performing test-time task adaptation. By attaching tiny per-layer adapters to a frozen ImageNet-pretrained backbone and enforcing consistency through dense contrastive losses, the method specializes features to the target task before performing dense query-support comparison. The approach achieves state-of-the-art results on CD-FSS benchmarks, demonstrating that test-time adaptation can outperform traditional training-based generalization strategies. The findings argue for rethinking CD-FSS from training-time generalization to robust, task-specific adaptation at inference, with implications for efficiency and practical deployment.
Abstract
Few-shot segmentation performance declines substantially when facing images from a domain different than the training domain, effectively limiting real-world use cases. To alleviate this, recently cross-domain few-shot segmentation (CD-FSS) has emerged. Works that address this task mainly attempted to learn segmentation on a source domain in a manner that generalizes across domains. Surprisingly, we can outperform these approaches while eliminating the training stage and removing their main segmentation network. We show test-time task-adaption is the key for successful CD-FSS instead. Task-adaption is achieved by appending small networks to the feature pyramid of a conventionally classification-pretrained backbone. To avoid overfitting to the few labeled samples in supervised fine-tuning, consistency across augmented views of input images serves as guidance while learning the parameters of the attached layers. Despite our self-restriction not to use any images other than the few labeled samples at test time, we achieve new state-of-the-art performance in CD-FSS, evidencing the need to rethink approaches for the task.
