Table of Contents
Fetching ...

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han

TL;DR

The paper tackles semantic ambiguity in Segment Anything (SAM) outputs when performing point-prompted instance segmentation by proposing SAPNet, an end-to-end framework that fuses SAM with semantic cues through point prompts and a Multiple Instance Learning (MIL) based proposal selection. SAPNet introduces a dedicated pipeline comprising Proposal Selection Module (PSM), Positive and Negative Proposals Generator (PNPG), and Proposals Refinement Module (PRM), along with Multi-mask Proposals Supervision (MPS) to produce category-specific mask proposals that supervise a segmentation branch (SOLOv2). The approach uses Point Distance Guidance (PDG) and Box Mining Strategy (BMS) to mitigate MIL’s locality and grouping biases, achieving state-of-the-art performance for point-prompted instance segmentation on COCO and VOC2012SBD. By converting point annotations into semantically aware mask proposals and training end-to-end, SAPNet narrows the gap between point-prompted and fully supervised segmentation, while reducing labeling costs and improving practical applicability.

Abstract

Single-point annotation in visual tasks, with the goal of minimizing labelling costs, is becoming increasingly prominent in research. Recently, visual foundation models, such as Segment Anything (SAM), have gained widespread usage due to their robust zero-shot capabilities and exceptional annotation performance. However, SAM's class-agnostic output and high confidence in local segmentation introduce 'semantic ambiguity', posing a challenge for precise category-specific segmentation. In this paper, we introduce a cost-effective category-specific segmenter using SAM. To tackle this challenge, we have devised a Semantic-Aware Instance Segmentation Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts. SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific focus on object category information. Moreover, we introduce the Point Distance Guidance and Box Mining Strategy to mitigate inherent challenges: 'group' and 'local' issues in weakly supervised segmentation. These strategies serve to further enhance the overall segmentation performance. The experimental results on Pascal VOC and COCO demonstrate the promising performance of our proposed SAPNet, emphasizing its semantic matching capabilities and its potential to advance point-prompted instance segmentation. The code will be made publicly available.

Semantic-aware SAM for Point-Prompted Instance Segmentation

TL;DR

The paper tackles semantic ambiguity in Segment Anything (SAM) outputs when performing point-prompted instance segmentation by proposing SAPNet, an end-to-end framework that fuses SAM with semantic cues through point prompts and a Multiple Instance Learning (MIL) based proposal selection. SAPNet introduces a dedicated pipeline comprising Proposal Selection Module (PSM), Positive and Negative Proposals Generator (PNPG), and Proposals Refinement Module (PRM), along with Multi-mask Proposals Supervision (MPS) to produce category-specific mask proposals that supervise a segmentation branch (SOLOv2). The approach uses Point Distance Guidance (PDG) and Box Mining Strategy (BMS) to mitigate MIL’s locality and grouping biases, achieving state-of-the-art performance for point-prompted instance segmentation on COCO and VOC2012SBD. By converting point annotations into semantically aware mask proposals and training end-to-end, SAPNet narrows the gap between point-prompted and fully supervised segmentation, while reducing labeling costs and improving practical applicability.

Abstract

Single-point annotation in visual tasks, with the goal of minimizing labelling costs, is becoming increasingly prominent in research. Recently, visual foundation models, such as Segment Anything (SAM), have gained widespread usage due to their robust zero-shot capabilities and exceptional annotation performance. However, SAM's class-agnostic output and high confidence in local segmentation introduce 'semantic ambiguity', posing a challenge for precise category-specific segmentation. In this paper, we introduce a cost-effective category-specific segmenter using SAM. To tackle this challenge, we have devised a Semantic-Aware Instance Segmentation Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts. SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific focus on object category information. Moreover, we introduce the Point Distance Guidance and Box Mining Strategy to mitigate inherent challenges: 'group' and 'local' issues in weakly supervised segmentation. These strategies serve to further enhance the overall segmentation performance. The experimental results on Pascal VOC and COCO demonstrate the promising performance of our proposed SAPNet, emphasizing its semantic matching capabilities and its potential to advance point-prompted instance segmentation. The code will be made publicly available.
Paper Structure (16 sections, 11 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 16 sections, 11 equations, 8 figures, 9 tables, 2 algorithms.

Figures (8)

  • Figure 1: Three Challenges Brought by SAM and single-MIL. Orange dash box illustrates that semantic ambiguity in SAM-generated masks, where it erroneously assigns higher scores to non-object categories like clothes, despite the person being our desired target. Green dash box depicts a comparison between mask proposals using single-MIL and SAPNet. It illustrates two primary challenges: 'group', where segmentation encounters difficulties in isolating individual targets among adjacent objects of the same category, and 'local', where MIL favors foreground-dominant regions, resulting in overlooked local details.
  • Figure 2: The framework of SAPNet comprises two components: one for generating mask proposals and another for their utilization in instance segmentation. The process starts with generating category-agnostic mask proposals using point prompts within a visual foundation model. That is followed by an initial proposal selection via MIL combined with PDG. Next, the PRM refines these proposals using positive and negative samples from PNPG, capturing global object semantics. Finally, augmented with the multi-mask proposal supervision, the segmentation branch aims to improve segmentation quality.
  • Figure 3: The mechanism of the proposal selection module.
  • Figure 4: The comparative visualization between SAM-top1 and SAPNet is presented, showcasing SAM's segmentation outcomes in green masks and our results in yellow. The orange and red bounding boxes highlight the respective mask boundaries.
  • Figure 5: Visualization comparison between SAM-top1 and SAPNet on COCO 2017 dataset about semantic ambiguity, showing SAM's segmentation outcomes top-1 in green masks and our results in yellow masks. The blue and red bounding boxes highlight the respective mask boundaries.
  • ...and 3 more figures