Table of Contents
Fetching ...

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu

TL;DR

A comprehensive study of CD-FSS is undertaken and reveals the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and the overfitting risk during the naive fine-tuning due to the scarcity of novel category examples is uncovered.

Abstract

Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA.

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

TL;DR

A comprehensive study of CD-FSS is undertaken and reveals the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and the overfitting risk during the naive fine-tuning due to the scarcity of novel category examples is uncovered.

Abstract

Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA.
Paper Structure (19 sections, 10 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 10 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: We investigate two types of category correspondence, left:Support-to-Query (S2Q) and right:Support-to-Query-to-Support (S2Q2S) under four experimental setups (a-d). (a) In-domain performances without fine-tuning (FT) set oracle baselines for Cross-Domain Few-Shot Segmentation (CD-FSS). (b) Cross-domain results without fine-tuning suffer from severe performance drops, which verifies the necessity of bridging domain gap for CD-FSS. (c) Cross-domain setups with naïve fine-tuning only bring small performance gains, which is attributed to the overfitting risk of CD-FSS fine-tuning. Notably, there also underlies rich unexplored category correspondence in S2Q2S. (d) Cross-domain setup with our proposed Iterative Few-Shot Adaptor (IFA) achieve significant performance gains. IFA comprehensively exploits maximum information content in the given data by capturing both S2Q and S2Q2S category correspondence during fine-tuning.
  • Figure 2: Illustration of our designs for CD-FSS: (a) Bi-directional Few-shot Prediction, and (b) Iterative Few-shot Adaptor. $I_s$ and $I_q$ denote support and query images respectively. $F_s$ and $F_q$ denote the corresponding support and query features as extracted by the Encoder. $M_s$ denotes the support mask, and $P_s$ denotes the generated support prototype.
  • Figure 3: Overall architecture of the proposed Iterative Few-shot Adaptor (IFA), which is composed of two essential steps: training on the source domain, and fine-tuning over the target domain. In the training stage, we only adopt the Bi-directional Few-shot Prediction (BFP) (illustrated in yellow box), which is the fundamental unit of IFA. BFP is composed of both S2Q and S2Q2S streams together with supervision signals from both sides (blue arrows). In the fine-tuning stage where the target exemplars are extremely scarce, IFA is designed to iterate BFP $T$ times, recursively mining the support-query correspondence (illustrated in red box). To show the predictions clearly, we only visualize the region where confidence is higher than 0.5.
  • Figure 4: Examples of images and their corresponding ground-truth masks from four target domain datasets, encompassing a diverse range from satellite images and medical screenings to minuscule everyday objects.
  • Figure 5: Qualitative results of the samples in four target datasets. From left to right, each column shows examples from Deepglobe, ISIC, Chest X-Ray, and FSS-1000. From up to down, each row shows the examples of support images with ground-truth masks (green), query images with ground-truth masks (blue), PATNet results, Our baseline (SSP) results, and Our IFA results. $*$ represents the model reproduced by ourselves. Best viewed in color.
  • ...and 3 more figures