The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation
Yuhan Liu, Yixiong Zou, Yuhua Li, Ruixuan Li
TL;DR
Cross-Domain Few-Shot Segmentation ($\text{CDFSS}$) faces a perplexing phenomenon where target-domain performance peaks at early training and collapses as source-domain training continues. The authors diagnose this as a consequence of low-level features becoming domain-sensitive, which sharpens the loss landscape and impairs cross-domain generalization. They propose two plug-and-play modules: a train-time Low-level Enhancement Module ($\text{LEM}$) that uses shape-preserving perturbations via random convolution and FFT-based recombination to flatten low-level loss landscapes, and a test-time Low-level Calibration Module ($\text{LCM}$) that injects target-domain low-level cues to refine predictions. Extensive experiments across four target datasets show consistent, significant improvements over state-of-the-art methods in both 1-shot and 5-shot settings, highlighting the practical impact of addressing low-level feature vulnerability in cross-domain segmentation.
Abstract
Cross-Domain Few-Shot Segmentation (CDFSS) is proposed to transfer the pixel-level segmentation capabilities learned from large-scale source-domain datasets to downstream target-domain datasets, with only a few annotated images per class. In this paper, we focus on a well-observed but unresolved phenomenon in CDFSS: for target domains, particularly those distant from the source domain, segmentation performance peaks at the very early epochs, and declines sharply as the source-domain training proceeds. We delve into this phenomenon for an interpretation: low-level features are vulnerable to domain shifts, leading to sharper loss landscapes during the source-domain training, which is the devil of CDFSS. Based on this phenomenon and interpretation, we further propose a method that includes two plug-and-play modules: one to flatten the loss landscapes for low-level features during source-domain training as a novel sharpness-aware minimization method, and the other to directly supplement target-domain information to the model during target-domain testing by low-level-based calibration. Extensive experiments on four target datasets validate our rationale and demonstrate that our method surpasses the state-of-the-art method in CDFSS signifcantly by 3.71% and 5.34% average MIoU in 1-shot and 5-shot scenarios, respectively.
