Table of Contents
Fetching ...

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

Yuqi Lin, Hengjia Li, Wenqi Shao, Zheng Yang, Jun Zhao, Xiaofei He, Ping Luo, Kaipeng Zhang

TL;DR

This work tackles the challenge of refining coarse segmentation masks to produce reliable pseudo-labels for training, which is essential for cost-efficient segmentation. It introduces SAMRefiner, a universal, training-free framework that leverages SAM with noise-tolerant, multi-prompt excavation (distance-guided points, CEBox, Gaussian masks) and a split-then-merge pipeline to handle semantic segmentation, plus an optional IoU adaption in SAMRefiner++ to improve dataset-specific predictions. The approach shows strong improvements across multiple benchmarks (DAVIS-585, COCO, VOC) and supervision regimes, outperforming traditional refiners in accuracy and efficiency while preserving SAM's generality. Overall, SAMRefiner provides a practical, scalable tool to enhance pseudo-label quality, accelerating semi-/weakly-supervised learning and reducing annotation costs in real-world workflows.

Abstract

In this paper, we explore a principal way to enhance the quality of widely pre-existing coarse masks, enabling them to serve as reliable training data for segmentation models to reduce the annotation cost. In contrast to prior refinement techniques that are tailored to specific models or tasks in a close-world manner, we propose SAMRefiner, a universal and efficient approach by adapting SAM to the mask refinement task. The core technique of our model is the noise-tolerant prompting scheme. Specifically, we introduce a multi-prompt excavation strategy to mine diverse input prompts for SAM (i.e., distance-guided points, context-aware elastic bounding boxes, and Gaussian-style masks) from initial coarse masks. These prompts can collaborate with each other to mitigate the effect of defects in coarse masks. In particular, considering the difficulty of SAM to handle the multi-object case in semantic segmentation, we introduce a split-then-merge (STM) pipeline. Additionally, we extend our method to SAMRefiner++ by introducing an additional IoU adaption step to further boost the performance of the generic SAMRefiner on the target dataset. This step is self-boosted and requires no additional annotation. The proposed framework is versatile and can flexibly cooperate with existing segmentation methods. We evaluate our mask framework on a wide range of benchmarks under different settings, demonstrating better accuracy and efficiency. SAMRefiner holds significant potential to expedite the evolution of refinement tools. Our code is available at https://github.com/linyq2117/SAMRefiner.

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

TL;DR

This work tackles the challenge of refining coarse segmentation masks to produce reliable pseudo-labels for training, which is essential for cost-efficient segmentation. It introduces SAMRefiner, a universal, training-free framework that leverages SAM with noise-tolerant, multi-prompt excavation (distance-guided points, CEBox, Gaussian masks) and a split-then-merge pipeline to handle semantic segmentation, plus an optional IoU adaption in SAMRefiner++ to improve dataset-specific predictions. The approach shows strong improvements across multiple benchmarks (DAVIS-585, COCO, VOC) and supervision regimes, outperforming traditional refiners in accuracy and efficiency while preserving SAM's generality. Overall, SAMRefiner provides a practical, scalable tool to enhance pseudo-label quality, accelerating semi-/weakly-supervised learning and reducing annotation costs in real-world workflows.

Abstract

In this paper, we explore a principal way to enhance the quality of widely pre-existing coarse masks, enabling them to serve as reliable training data for segmentation models to reduce the annotation cost. In contrast to prior refinement techniques that are tailored to specific models or tasks in a close-world manner, we propose SAMRefiner, a universal and efficient approach by adapting SAM to the mask refinement task. The core technique of our model is the noise-tolerant prompting scheme. Specifically, we introduce a multi-prompt excavation strategy to mine diverse input prompts for SAM (i.e., distance-guided points, context-aware elastic bounding boxes, and Gaussian-style masks) from initial coarse masks. These prompts can collaborate with each other to mitigate the effect of defects in coarse masks. In particular, considering the difficulty of SAM to handle the multi-object case in semantic segmentation, we introduce a split-then-merge (STM) pipeline. Additionally, we extend our method to SAMRefiner++ by introducing an additional IoU adaption step to further boost the performance of the generic SAMRefiner on the target dataset. This step is self-boosted and requires no additional annotation. The proposed framework is versatile and can flexibly cooperate with existing segmentation methods. We evaluate our mask framework on a wide range of benchmarks under different settings, demonstrating better accuracy and efficiency. SAMRefiner holds significant potential to expedite the evolution of refinement tools. Our code is available at https://github.com/linyq2117/SAMRefiner.

Paper Structure

This paper contains 37 sections, 4 equations, 19 figures, 11 tables, 1 algorithm.

Figures (19)

  • Figure 1: Visualization of segmentation masks and performance.
  • Figure 2: Failure cases of SAM using the tight box of the coarse mask (red box) and directly using the coarse mask as the prompt. The tight box is sensitive to the false negative (first row) and false positive (last row) errors in the coarse mask, which would mislead SAM's predictions. And the separate mask prompt fails to work for SAM. Our proposed multi-prompt excavation strategy is robust to the noise.
  • Figure 3: (a) An overview of our proposed framework. SAMRefiner leverages SAM to refine coarse masks by automatically generating prompts from coarse masks, including distance-guided points, context-aware elastic boxes and Gaussian-style masks. We select the best mask from multiple generated masks based on SAM's IoU predictions. (b) An overview of the introduced IoU adaption step, which aims to enhance the IoU prediction ability of SAM on specific datasets. We adopt a LoRA-style adaptor at the last layer of IoU MLP and a ranking loss is used to improve the top-1 accuracy of IoU predictions. This step is self-boosted and requires no additional annotation.
  • Figure 4: Visualizations of our proposed techniques effects. All of them play a crucial role in mitigating the impact of defects in coarse masks.
  • Figure 5: The effect of different prompt types, mask modes and IoU selection criteria on DAVIS-585.
  • ...and 14 more figures