Table of Contents
Fetching ...

CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

Shuai Chen, Fanman Meng, Liming Lei, Haoran Wei, Chenhao Wu, Qingbo Wu, Linfeng Xu, Hongliang Li

TL;DR

CMaP-SAM tackles two main hurdles in SAM-guided few-shot segmentation: underutilized query-structure information and information loss from converting continuous priors to discrete prompts. It introduces a contraction-mapping prior module with convergence guarantees, an adaptive distribution alignment module, and a foreground-background decoupled refinement architecture to fuse support signals with query structure for accurate masks. The approach achieves state-of-the-art results on Pascal-5^i and COCO-20^i, and is supported by ablations and sensitivity analyses that validate each component and its interactions. The work provides theoretical convergence guarantees via a Banach fixed-point framework and offers a practical path to integrating SAM into FSS with reduced information loss, with code available for replication and further research.

Abstract

Few-shot segmentation (FSS) aims to segment new classes using few annotated images. While recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM), they face two critical limitations: insufficient utilization of structural correlations in query images, and significant information loss when converting continuous position priors to discrete point prompts. To address these challenges, we propose CMaP-SAM, a novel framework that introduces contraction mapping theory to optimize position priors for SAM-driven few-shot segmentation. CMaP-SAM consists of three key components: (1) a contraction mapping module that formulates position prior optimization as a Banach contraction mapping with convergence guarantees. This module iteratively refines position priors through pixel-wise structural similarity, generating a converged prior that preserves both semantic guidance from reference images and structural correlations in query images; (2) an adaptive distribution alignment module bridging continuous priors with SAM's binary mask prompt encoder; and (3) a foreground-background decoupled refinement architecture producing accurate final segmentation masks. Extensive experiments demonstrate CMaP-SAM's effectiveness, achieving state-of-the-art performance with 71.1 mIoU on PASCAL-$5^i$ and 56.1 on COCO-$20^i$ datasets. Code is available at https://github.com/Chenfan0206/CMaP-SAM.

CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

TL;DR

CMaP-SAM tackles two main hurdles in SAM-guided few-shot segmentation: underutilized query-structure information and information loss from converting continuous priors to discrete prompts. It introduces a contraction-mapping prior module with convergence guarantees, an adaptive distribution alignment module, and a foreground-background decoupled refinement architecture to fuse support signals with query structure for accurate masks. The approach achieves state-of-the-art results on Pascal-5^i and COCO-20^i, and is supported by ablations and sensitivity analyses that validate each component and its interactions. The work provides theoretical convergence guarantees via a Banach fixed-point framework and offers a practical path to integrating SAM into FSS with reduced information loss, with code available for replication and further research.

Abstract

Few-shot segmentation (FSS) aims to segment new classes using few annotated images. While recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM), they face two critical limitations: insufficient utilization of structural correlations in query images, and significant information loss when converting continuous position priors to discrete point prompts. To address these challenges, we propose CMaP-SAM, a novel framework that introduces contraction mapping theory to optimize position priors for SAM-driven few-shot segmentation. CMaP-SAM consists of three key components: (1) a contraction mapping module that formulates position prior optimization as a Banach contraction mapping with convergence guarantees. This module iteratively refines position priors through pixel-wise structural similarity, generating a converged prior that preserves both semantic guidance from reference images and structural correlations in query images; (2) an adaptive distribution alignment module bridging continuous priors with SAM's binary mask prompt encoder; and (3) a foreground-background decoupled refinement architecture producing accurate final segmentation masks. Extensive experiments demonstrate CMaP-SAM's effectiveness, achieving state-of-the-art performance with 71.1 mIoU on PASCAL- and 56.1 on COCO- datasets. Code is available at https://github.com/Chenfan0206/CMaP-SAM.

Paper Structure

This paper contains 24 sections, 1 theorem, 24 equations, 7 figures, 10 tables.

Key Result

Theorem 3.1

Given the iterative mapping defined in Equation eq:iterative_mapping with $\alpha \in (0,1]$ and $\delta > 0$ such that $\frac{\alpha}{(\delta + \varepsilon)^2} < 1$, the sequence ${\mathbf{M}^t}$ converges to a fixed prior $\mathbf{M}^*$ in the complete space $\bigl([0,1]^{B \times N_q}, |\cdot|_\i

Figures (7)

  • Figure 1: Comparative analysis: (a) Conventional approaches employ "generate location prior, then extract point prompt" pipeline, causing information loss. (b) The proposed CMaP-SAM optimizes position priors through contraction mapping theory, preserving both semantic guidance from support images and structural correlations within query images. Additionally, a distribution adapter is implemented to bridges continuous probability distributions with SAM's binary mask prompt encoder, thereby eliminating information loss.
  • Figure 2: Overview of CMaP-SAM. It consists of three main components: (1) Contraction Mapping Prior Module for position prior construction and optimization; (2) Distribution Alignment Module for bridging continuous priors with SAM's binary mask prompt encoder; and (3) Foreground-Background Decoupled Refinement Module for final segmentation refinement.
  • Figure 3: Iterative mapping process for position prior optimization. The process involves three key components: structural consistency propagation, initial semantic anchoring, and piecewise affine normalization.
  • Figure 4: Mask alignment module. The dual-prior mask alignment module bridges the representation gap between continuous position priors and SAM's binary mask requirements, preserving rich information in probability distributions.
  • Figure 5: Visualization of segmentation results.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof