SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

Zhaoyang Wei; Xumeng Han; Xuehui Yu; Xue Yang; Guorong Li; Zhenjun Han; Jianbin Jiao

SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

Zhaoyang Wei, Xumeng Han, Xuehui Yu, Xue Yang, Guorong Li, Zhenjun Han, Jianbin Jiao

TL;DR

The Semantic-Aware Point-Prompted Instance Segmentation Network (SAPNet) integrates Point Distance Guidance and Box Mining Strategy to tackle group and local issues caused by the point's granularity ambiguity, and incorporates completeness scores within proposals to add spatial granularity awareness.

Abstract

Single-point annotation is increasingly prominent in visual tasks for labeling cost reduction. However, it challenges tasks requiring high precision, such as the point-prompted instance segmentation (PPIS) task, which aims to estimate precise masks using single-point prompts to train a segmentation network. Due to the constraints of point annotations, granularity ambiguity and boundary uncertainty arise the difficulty distinguishing between different levels of detail (eg. whole object vs. parts) and the challenge of precisely delineating object boundaries. Previous works have usually inherited the paradigm of mask generation along with proposal selection to achieve PPIS. However, proposal selection relies solely on category information, failing to resolve the ambiguity of different granularity. Furthermore, mask generators offer only finite discrete solutions that often deviate from actual masks, particularly at boundaries. To address these issues, we propose the Semantic-Aware Point-Prompted Instance Segmentation Network (SAPNet). It integrates Point Distance Guidance and Box Mining Strategy to tackle group and local issues caused by the point's granularity ambiguity. Additionally, we incorporate completeness scores within proposals to add spatial granularity awareness, enhancing multiple instance learning (MIL) in proposal selection termed S-MIL. The Multi-level Affinity Refinement conveys pixel and semantic clues, narrowing boundary uncertainty during mask refinement. These modules culminate in SAPNet++, mitigating point prompt's granularity ambiguity and boundary uncertainty and significantly improving segmentation performance. Extensive experiments on four challenging datasets validate the effectiveness of our methods, highlighting the potential to advance PPIS.

SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

TL;DR

Abstract

Paper Structure (26 sections, 29 equations, 7 figures, 14 tables, 2 algorithms)

This paper contains 26 sections, 29 equations, 7 figures, 14 tables, 2 algorithms.

Introduction
Related Work
Methodology
Overview
SAPNet
Proposal Selection Mechanism
Selection Refinement Mechanism
SAPNet++
Spatial-aware Self-distillation for Proposal Selection
Multi-level Affinity Refinement
Training and Inference
Experiment
Experimental Settings
Datasets and Evaluation
Implementation Details.
...and 11 more sections

Figures (7)

Figure 1: The motivation behind SAPNet++ stems from two key challenges of point annotation: (a) Granularity Ambiguity: SAM-generated masks often assign higher scores to non-target categories (e.g., the clothes instead of the person in the yellow dashed box). Additionally, segmentation struggles to separate individual targets within the same category (group issue) and overlooks local details due to MIL's preference under point labels for foreground-dominant regions (local issue) in the green dashed box. Within the blue dashed box,our proposed S-MIL tackles these issues by selecting proposals that capture the target's complete semantic information. (b) Boundary Uncertainty: Despite resolving granularity ambiguity and related issues, compared to the ground truth, most proposal generators still yield proposals of varying quality, leading to "boundary uncertainty" where masks do not fully encompass the targets. We address that by refining predicted masks during segmentation to achieve highly satisfactory results even with imprecise supervision.
Figure 2: SAPNet++ is structured into three key branches: the proposal selection mechanism branch for the initial proposal selection using proposal selection and SASD I, the selection refinement mechanism branch for refining proposals through SASD II and semantic matching, and the SEG branch for segmentation and further refinement. i). Proposal Selection Mechanism (PSM) Branch: The process starts with generating category-agnostic mask proposals using point prompts within a visual foundation model. The PSM branch employs multi-instance learning and a point-guided strategy to construct the PSM for the input box proposals. It combines completeness scores from SASD I with confidence scores to capture global object semantics, initially filtering box proposals to obtain the $box_{psm}$. ii). Selection Refinement Mechanism (SRM) Branch: Emulating typical MIL approaches, this branch utilizes positive and negative proposal generator to create high-quality positive and regulatory negative bags to enhance the semantic matching capabilities of the SRM. These bags are processed through SASD II and refined using box mining strategy, resulting in the refined $box_{srm}$. iii). Segment Branch: We use the final selected pseudo box to obtain corresponding mask proposals through IoU matching. Given that these mask proposals are less precise than ground truth, the branch employs Affinity Refinement at both pixel and high-level semantic space to construct soft pseudo masks alongside mask proposals for enhanced segmentation supervision.
Figure 3: (a) The PSM weights scores from category, instance, and point-guided strategies to preliminarily filter proposals. (b) The SRM further enhances the selection capability by introducing negative examples and employs box mining strategy to address the local issues. (c) The SASD predicts a completeness score for each proposal in PSM and SRM, re-weighting them with the previous scores.
Figure 4: Multi-level Affinity Refinement integrates pixel-level color and texture information with high-dimensional semantic information from the backbone, refining segmentation masks through global and local affinity.
Figure 5: Visualization of instance segmentation results on the COCO train2017 dataset.
...and 2 more figures

SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

TL;DR

Abstract

SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

Authors

TL;DR

Abstract

Table of Contents

Figures (7)