Table of Contents
Fetching ...

Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation

Runmin Cong, Anpeng Wang, Bin Wan, Cong Zhang, Xiaofei Zhou, Wei Zhang

TL;DR

This work tackles cross-domain few-shot segmentation (CD-FSS) by addressing the entanglement of domain and category information in backbone features. The authors introduce Divide-and-Conquer Decoupled Network (DCDNet), which decomposes features into domain-relevant shared and category-relevant private components using Adversarial-Contrastive Feature Decomposition (ACFD), then fuses them with base features through Matrix-Guided Dynamic Fusion (MGDF); Cross-Adaptive Modulation (CAM) is employed during fine-tuning to inject domain knowledge. A combined loss with adversarial, contrastive, and orthogonality terms, and a matrix-guided fusion strategy, yields strong cross-domain generalization and rapid adaptation, achieving state-of-the-art results on four CD-FSS benchmarks (e.g., ISIC and FSS-1000) in both 1-shot and 5-shot settings. The approach demonstrates that refining and re-combining disentangled feature components can surpass traditional adapter-based methods, enabling robust performance under substantial domain shifts and limited annotations with practical impact for real-world cross-domain segmentation tasks.

Abstract

Cross-domain few-shot segmentation (CD-FSS) aims to tackle the dual challenge of recognizing novel classes and adapting to unseen domains with limited annotations. However, encoder features often entangle domain-relevant and category-relevant information, limiting both generalization and rapid adaptation to new domains. To address this issue, we propose a Divide-and-Conquer Decoupled Network (DCDNet). In the training stage, to tackle feature entanglement that impedes cross-domain generalization and rapid adaptation, we propose the Adversarial-Contrastive Feature Decomposition (ACFD) module. It decouples backbone features into category-relevant private and domain-relevant shared representations via contrastive learning and adversarial learning. Then, to mitigate the potential degradation caused by the disentanglement, the Matrix-Guided Dynamic Fusion (MGDF) module adaptively integrates base, shared, and private features under spatial guidance, maintaining structural coherence. In addition, in the fine-tuning stage, to enhanced model generalization, the Cross-Adaptive Modulation (CAM) module is placed before the MGDF, where shared features guide private features via modulation ensuring effective integration of domain-relevant information. Extensive experiments on four challenging datasets show that DCDNet outperforms existing CD-FSS methods, setting a new state-of-the-art for cross-domain generalization and few-shot adaptation.

Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation

TL;DR

This work tackles cross-domain few-shot segmentation (CD-FSS) by addressing the entanglement of domain and category information in backbone features. The authors introduce Divide-and-Conquer Decoupled Network (DCDNet), which decomposes features into domain-relevant shared and category-relevant private components using Adversarial-Contrastive Feature Decomposition (ACFD), then fuses them with base features through Matrix-Guided Dynamic Fusion (MGDF); Cross-Adaptive Modulation (CAM) is employed during fine-tuning to inject domain knowledge. A combined loss with adversarial, contrastive, and orthogonality terms, and a matrix-guided fusion strategy, yields strong cross-domain generalization and rapid adaptation, achieving state-of-the-art results on four CD-FSS benchmarks (e.g., ISIC and FSS-1000) in both 1-shot and 5-shot settings. The approach demonstrates that refining and re-combining disentangled feature components can surpass traditional adapter-based methods, enabling robust performance under substantial domain shifts and limited annotations with practical impact for real-world cross-domain segmentation tasks.

Abstract

Cross-domain few-shot segmentation (CD-FSS) aims to tackle the dual challenge of recognizing novel classes and adapting to unseen domains with limited annotations. However, encoder features often entangle domain-relevant and category-relevant information, limiting both generalization and rapid adaptation to new domains. To address this issue, we propose a Divide-and-Conquer Decoupled Network (DCDNet). In the training stage, to tackle feature entanglement that impedes cross-domain generalization and rapid adaptation, we propose the Adversarial-Contrastive Feature Decomposition (ACFD) module. It decouples backbone features into category-relevant private and domain-relevant shared representations via contrastive learning and adversarial learning. Then, to mitigate the potential degradation caused by the disentanglement, the Matrix-Guided Dynamic Fusion (MGDF) module adaptively integrates base, shared, and private features under spatial guidance, maintaining structural coherence. In addition, in the fine-tuning stage, to enhanced model generalization, the Cross-Adaptive Modulation (CAM) module is placed before the MGDF, where shared features guide private features via modulation ensuring effective integration of domain-relevant information. Extensive experiments on four challenging datasets show that DCDNet outperforms existing CD-FSS methods, setting a new state-of-the-art for cross-domain generalization and few-shot adaptation.

Paper Structure

This paper contains 17 sections, 13 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Comparison of existing CD-FSS methods and ours. (a) Most existing methods introduce adapter-like modules to backbone architectures or feature embeddings, yet neglect to address the intrinsic information redundancy within base features themselves. (b) Our method decomposes and refines the base features, then effectively integrates these distilled features to endow the model with enhanced generalization and domain adaptation capabilities.
  • Figure 2: Overall architecture of our method in 1-shot example. During training, feature maps are decomposed through the Adversarial-Contrastive Feature Decomposition module—leveraging adversarial and contrastive learning objectives—to yield disentangled representations of shared, private, and base features. These components subsequently undergo fusion via Matrix-Guided Dynamic Fusion, generating enhanced representations that drive query image segmentation through the SSP method fan2022self. During fine-tuning and testing, the Cross-Adaptive Modulation module strategically integrates prior knowledge from shared features into private features via feature modulation, while BFP module-based nie2024cross refinement further optimizes segmentation masks.
  • Figure 3: Qualitative results of the samples in four target datasets. From up to down, each row shows examples from FSS-1000, ISIC, Chest X-Ray, and Deepglobe. From left to right, each column shows the examples of support images with ground-truth masks, query images with ground-truth masks, SSP results, DR-Adapter results, IFA results and Our results.