Table of Contents
Fetching ...

Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation

Jiayi Chen, Rong Quan, Jie Qin

TL;DR

CD-FSS faces performance degradation when domain shifts separate source and target label spaces. DMTNet tackles this with SMT to derive query-specific, self-derived domain-agnostic features and DHC to exploit both foreground and background hypercorrelations, complemented by TSF for test-time adaptation. The approach achieves state-of-the-art MIoU on four diverse datasets in both 1-shot and 5-shot settings, confirming improved generalization and robustness to intra-class appearance variation. The combination of self-guided transformation, dual correlation learning, and lightweight test-time finetuning offers a practical framework for cross-domain segmentation with minimal support data.

Abstract

Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easily lead to overfitting and overlooking intra-class appearance differences. In this paper, we propose a Doubly Matching Transformation-based Network (DMTNet) to solve the above issue. Instead of completely relying on support images, we propose Self-Matching Transformation (SMT) to construct query-specific transformation matrices based on query images themselves to transform domain-specific query features into domain-agnostic ones. Calculating query-specific transformation matrices can prevent overfitting, especially for the meta-testing stage where only one or several images are used as support images to segment hundreds or thousands of images. After obtaining domain-agnostic features, we exploit a Dual Hypercorrelation Construction (DHC) module to explore the hypercorrelations between the query image with the foreground and background of the support image, based on which foreground and background prediction maps are generated and supervised, respectively, to enhance the segmentation result. In addition, we propose a Test-time Self-Finetuning (TSF) strategy to more accurately self-tune the query prediction in unseen domains. Extensive experiments on four popular datasets show that DMTNet achieves superior performance over state-of-the-art approaches. Code is available at https://github.com/ChenJiayi68/DMTNet.

Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation

TL;DR

CD-FSS faces performance degradation when domain shifts separate source and target label spaces. DMTNet tackles this with SMT to derive query-specific, self-derived domain-agnostic features and DHC to exploit both foreground and background hypercorrelations, complemented by TSF for test-time adaptation. The approach achieves state-of-the-art MIoU on four diverse datasets in both 1-shot and 5-shot settings, confirming improved generalization and robustness to intra-class appearance variation. The combination of self-guided transformation, dual correlation learning, and lightweight test-time finetuning offers a practical framework for cross-domain segmentation with minimal support data.

Abstract

Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easily lead to overfitting and overlooking intra-class appearance differences. In this paper, we propose a Doubly Matching Transformation-based Network (DMTNet) to solve the above issue. Instead of completely relying on support images, we propose Self-Matching Transformation (SMT) to construct query-specific transformation matrices based on query images themselves to transform domain-specific query features into domain-agnostic ones. Calculating query-specific transformation matrices can prevent overfitting, especially for the meta-testing stage where only one or several images are used as support images to segment hundreds or thousands of images. After obtaining domain-agnostic features, we exploit a Dual Hypercorrelation Construction (DHC) module to explore the hypercorrelations between the query image with the foreground and background of the support image, based on which foreground and background prediction maps are generated and supervised, respectively, to enhance the segmentation result. In addition, we propose a Test-time Self-Finetuning (TSF) strategy to more accurately self-tune the query prediction in unseen domains. Extensive experiments on four popular datasets show that DMTNet achieves superior performance over state-of-the-art approaches. Code is available at https://github.com/ChenJiayi68/DMTNet.
Paper Structure (14 sections, 10 equations, 4 figures, 3 tables)

This paper contains 14 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overall architecture of the proposed DMTNet. After obtaining the pyramid features of support and query images, Self-Matching Transformation module (SMT) learns each image a self-adaptive transformation matrix, to transform its domain-specific features into domain-agnostic ones. Then, the Dual Hypercorrelation Construction (DHC) module is introduced to construct dense correlations between the query image with both the foreground and background of the support image. In the meta-testing stage, the Test-time Self-Finetuning (TSF) strategy fine-tunes a few parameters of the encoder to further improve the segmentation performance.
  • Figure 2: Qualitative results on the ISIC, Chest X-ray, Deepglobe, and FSS-1000 datasets under the 1-shot setting. The blue parts represent support masks and the red parts represent query masks and query predictions.
  • Figure 3: Qualitative results w.r.t. SMT, DHC, and TSF. The first two columns show the ground truth of the support and query images. The third column shows the predicted masks of DMTNet. The fourth column shows the prediction masks without the TSF. The last column shows the prediction masks without DHC and TSF modules.
  • Figure 4: Visualization results w.r.t. SMT. The first and second rows represent the feature distributions before and after applying SMT, respectively. The red dots represent the PASCAL VOC dataset and the blue dots represent the ISIC or FSS-1000 datasets.