BiSeg-SAM: Weakly-Supervised Post-Processing Framework for Boosting Binary Segmentation in Segment Anything Models
Encheng Su, Hu Cao, Alois Knoll
TL;DR
BiSeg-SAM addresses the challenge of medical binary segmentation with limited pixel-level annotations by refining SAM outputs through a weakly supervised post-processing framework. It introduces an Adaptively Global-Local Module to fuse local CNN features with SAM, a WeakBox module with MM2B transformation and Scale Consistency loss for adaptive box prompts, and a DetailRefine module that sharpens boundaries using a small set of GT examples. Experiments across five polyp datasets and ISIC demonstrate state-of-the-art performance, especially in multi-foreground and boundary-precise cases, while reducing annotation costs. The approach shows strong practical potential for medical image analysis and offers avenues for extension to other modalities and multi-modal fusion.
Abstract
Accurate segmentation of polyps and skin lesions is essential for diagnosing colorectal and skin cancers. While various segmentation methods for polyps and skin lesions using fully supervised deep learning techniques have been developed, the pixel-level annotation of medical images by doctors is both time-consuming and costly. Foundational vision models like the Segment Anything Model (SAM) have demonstrated superior performance; however, directly applying SAM to medical segmentation may not yield satisfactory results due to the lack of domain-specific medical knowledge. In this paper, we propose BiSeg-SAM, a SAM-guided weakly supervised prompting and boundary refinement network for the segmentation of polyps and skin lesions. Specifically, we fine-tune SAM combined with a CNN module to learn local features. We introduce a WeakBox with two functions: automatically generating box prompts for the SAM model and using our proposed Multi-choice Mask-to-Box (MM2B) transformation for rough mask-to-box conversion, addressing the mismatch between coarse labels and precise predictions. Additionally, we apply scale consistency (SC) loss for prediction scale alignment. Our DetailRefine module enhances boundary precision and segmentation accuracy by refining coarse predictions using a limited amount of ground truth labels. This comprehensive approach enables BiSeg-SAM to achieve excellent multi-task segmentation performance. Our method demonstrates significant superiority over state-of-the-art (SOTA) methods when tested on five polyp datasets and one skin cancer dataset.
