Annotation-Efficient Polyp Segmentation via Active Learning
Duojun Huang, Xinyu Xiong, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li
TL;DR
The paper tackles the annotation burden in polyp segmentation by introducing an annotation-efficient deep active learning framework. It jointly leverages an uncertain-weighted clustering of image-level features to select $B$ informative unlabeled samples per round based on $I(x^u) = \cos(F_1(x^u), F_0(x^u))$, and a novel unsupervised feature discrepancy loss $\mathcal{L}_{fdl}$ to sharpen class separation on unlabeled data. Training uses a supervised loss $\mathcal{L}_{seg}$ plus $\lambda_c \mathcal{L}_{fdl}$ with $\lambda_c=0.1$, and employs $R=5$ rounds with $B=30$ samples per round. Experiments on CVC-ClinicDB and an in-house dataset show state-of-the-art performance under limited annotation budgets, with gains of about 1–2 percentage points in $mIoU$ and Dice over strong baselines, demonstrating reduced labeling burden and improved generalization for colonoscopy polyp segmentation.
Abstract
Deep learning-based techniques have proven effective in polyp segmentation tasks when provided with sufficient pixel-wise labeled data. However, the high cost of manual annotation has created a bottleneck for model generalization. To minimize annotation costs, we propose a deep active learning framework for annotation-efficient polyp segmentation. In practice, we measure the uncertainty of each sample by examining the similarity between features masked by the prediction map of the polyp and the background area. Since the segmentation model tends to perform weak in samples with indistinguishable features of foreground and background areas, uncertainty sampling facilitates the fitting of under-learning data. Furthermore, clustering image-level features weighted by uncertainty identify samples that are both uncertain and representative. To enhance the selectivity of the active selection strategy, we propose a novel unsupervised feature discrepancy learning mechanism. The selection strategy and feature optimization work in tandem to achieve optimal performance with a limited annotation budget. Extensive experimental results have demonstrated that our proposed method achieved state-of-the-art performance compared to other competitors on both a public dataset and a large-scale in-house dataset.
