Table of Contents
Fetching ...

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation

Yuhan Liu, Yixiong Zou, Yuhua Li, Ruixuan Li

TL;DR

Cross-Domain Few-Shot Segmentation ($\text{CDFSS}$) faces a perplexing phenomenon where target-domain performance peaks at early training and collapses as source-domain training continues. The authors diagnose this as a consequence of low-level features becoming domain-sensitive, which sharpens the loss landscape and impairs cross-domain generalization. They propose two plug-and-play modules: a train-time Low-level Enhancement Module ($\text{LEM}$) that uses shape-preserving perturbations via random convolution and FFT-based recombination to flatten low-level loss landscapes, and a test-time Low-level Calibration Module ($\text{LCM}$) that injects target-domain low-level cues to refine predictions. Extensive experiments across four target datasets show consistent, significant improvements over state-of-the-art methods in both 1-shot and 5-shot settings, highlighting the practical impact of addressing low-level feature vulnerability in cross-domain segmentation.

Abstract

Cross-Domain Few-Shot Segmentation (CDFSS) is proposed to transfer the pixel-level segmentation capabilities learned from large-scale source-domain datasets to downstream target-domain datasets, with only a few annotated images per class. In this paper, we focus on a well-observed but unresolved phenomenon in CDFSS: for target domains, particularly those distant from the source domain, segmentation performance peaks at the very early epochs, and declines sharply as the source-domain training proceeds. We delve into this phenomenon for an interpretation: low-level features are vulnerable to domain shifts, leading to sharper loss landscapes during the source-domain training, which is the devil of CDFSS. Based on this phenomenon and interpretation, we further propose a method that includes two plug-and-play modules: one to flatten the loss landscapes for low-level features during source-domain training as a novel sharpness-aware minimization method, and the other to directly supplement target-domain information to the model during target-domain testing by low-level-based calibration. Extensive experiments on four target datasets validate our rationale and demonstrate that our method surpasses the state-of-the-art method in CDFSS signifcantly by 3.71% and 5.34% average MIoU in 1-shot and 5-shot scenarios, respectively.

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation

TL;DR

Cross-Domain Few-Shot Segmentation () faces a perplexing phenomenon where target-domain performance peaks at early training and collapses as source-domain training continues. The authors diagnose this as a consequence of low-level features becoming domain-sensitive, which sharpens the loss landscape and impairs cross-domain generalization. They propose two plug-and-play modules: a train-time Low-level Enhancement Module () that uses shape-preserving perturbations via random convolution and FFT-based recombination to flatten low-level loss landscapes, and a test-time Low-level Calibration Module () that injects target-domain low-level cues to refine predictions. Extensive experiments across four target datasets show consistent, significant improvements over state-of-the-art methods in both 1-shot and 5-shot settings, highlighting the practical impact of addressing low-level feature vulnerability in cross-domain segmentation.

Abstract

Cross-Domain Few-Shot Segmentation (CDFSS) is proposed to transfer the pixel-level segmentation capabilities learned from large-scale source-domain datasets to downstream target-domain datasets, with only a few annotated images per class. In this paper, we focus on a well-observed but unresolved phenomenon in CDFSS: for target domains, particularly those distant from the source domain, segmentation performance peaks at the very early epochs, and declines sharply as the source-domain training proceeds. We delve into this phenomenon for an interpretation: low-level features are vulnerable to domain shifts, leading to sharper loss landscapes during the source-domain training, which is the devil of CDFSS. Based on this phenomenon and interpretation, we further propose a method that includes two plug-and-play modules: one to flatten the loss landscapes for low-level features during source-domain training as a novel sharpness-aware minimization method, and the other to directly supplement target-domain information to the model during target-domain testing by low-level-based calibration. Extensive experiments on four target datasets validate our rationale and demonstrate that our method surpasses the state-of-the-art method in CDFSS signifcantly by 3.71% and 5.34% average MIoU in 1-shot and 5-shot scenarios, respectively.

Paper Structure

This paper contains 26 sections, 11 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: (a) In CDFSS tasks, the training (source) and testing (target) datasets belong to distinct domains, with categories in the testing dataset being unseen during training. (b) Previous CDFSS methods show a decreasing trend of mIoU as the source-domain training proceeds, even at very early epochs for distant domains. (c) Our method can effectively prevent the model from performance decline after early epochs and achieve higher performance.
  • Figure 2: The visualization of predictions at the 1st and 20th epoch further indicate that CDFSS models may not acquire meaningful information for target domains during source-domain training.
  • Figure 3: Feature maps from stages 1 to 4 for both the source domain and target domains. The noticeable contrast in the feature maps at stage 1 between source and target domains indicates the limited performance of CDFSS models stems from shallow layers.
  • Figure 4: Feature maps from stages 1 to 4 for epoch 1 and epoch 20. Epoch 1 in stage 1 shows more distinguishable activations than epoch 20, indicating that low-level features gradually incorporate incorrect information as training progresses.
  • Figure 5: (a) A sharp minimum in the landscape corresponds to a representation that is highly sensitive to data shifts. (b) Examples of pixel perturbation applied to the images. (c) As training proceeds, the loss landscape becomes progressively sharper. (d) Perturbing shallow layers leads to much sharper loss landscapes, indicating shallow layers are the cause of the sharp loss landscape and the increased sensitivity to domain shifts.
  • ...and 8 more figures