Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation
Runmin Cong, Anpeng Wang, Bin Wan, Cong Zhang, Xiaofei Zhou, Wei Zhang
TL;DR
This work tackles cross-domain few-shot segmentation (CD-FSS) by addressing the entanglement of domain and category information in backbone features. The authors introduce Divide-and-Conquer Decoupled Network (DCDNet), which decomposes features into domain-relevant shared and category-relevant private components using Adversarial-Contrastive Feature Decomposition (ACFD), then fuses them with base features through Matrix-Guided Dynamic Fusion (MGDF); Cross-Adaptive Modulation (CAM) is employed during fine-tuning to inject domain knowledge. A combined loss with adversarial, contrastive, and orthogonality terms, and a matrix-guided fusion strategy, yields strong cross-domain generalization and rapid adaptation, achieving state-of-the-art results on four CD-FSS benchmarks (e.g., ISIC and FSS-1000) in both 1-shot and 5-shot settings. The approach demonstrates that refining and re-combining disentangled feature components can surpass traditional adapter-based methods, enabling robust performance under substantial domain shifts and limited annotations with practical impact for real-world cross-domain segmentation tasks.
Abstract
Cross-domain few-shot segmentation (CD-FSS) aims to tackle the dual challenge of recognizing novel classes and adapting to unseen domains with limited annotations. However, encoder features often entangle domain-relevant and category-relevant information, limiting both generalization and rapid adaptation to new domains. To address this issue, we propose a Divide-and-Conquer Decoupled Network (DCDNet). In the training stage, to tackle feature entanglement that impedes cross-domain generalization and rapid adaptation, we propose the Adversarial-Contrastive Feature Decomposition (ACFD) module. It decouples backbone features into category-relevant private and domain-relevant shared representations via contrastive learning and adversarial learning. Then, to mitigate the potential degradation caused by the disentanglement, the Matrix-Guided Dynamic Fusion (MGDF) module adaptively integrates base, shared, and private features under spatial guidance, maintaining structural coherence. In addition, in the fine-tuning stage, to enhanced model generalization, the Cross-Adaptive Modulation (CAM) module is placed before the MGDF, where shared features guide private features via modulation ensuring effective integration of domain-relevant information. Extensive experiments on four challenging datasets show that DCDNet outperforms existing CD-FSS methods, setting a new state-of-the-art for cross-domain generalization and few-shot adaptation.
