Table of Contents
Fetching ...

MARIO: A Mixed Annotation Framework For Polyp Segmentation

Haoyang Li, Yiwen Hu, Jun Wei, Zhen Li

TL;DR

MARIO addresses data scarcity in polyp segmentation by introducing mixed supervision to jointly learn from pixel-, polygon-, box-, scribble-, and point-level annotations within a transformer-based framework. It defines dedicated losses for dense ($L_{BCE}$, $L_{Dice}$), box (mask-to-box, $M2B$), scribble ($L_{Uncertain}$), and point ($L_{points}$) supervision, and combines them into a total objective, enabling training from heterogeneous data sources. Across eight datasets, MARIO achieves state-of-the-art performance, with a weighted-average Dice of $85.8$ and IoU of $78.3$, outperforming fully-supervised baselines. This mixed-supervision approach expands usable data, reduces labeling burdens, and supports practical clinical deployment for colorectal polyp screening.

Abstract

Existing polyp segmentation models are limited by high labeling costs and the small size of datasets. Additionally, vast polyp datasets remain underutilized because these models typically rely on a single type of annotation. To address this dilemma, we introduce MARIO, a mixed supervision model designed to accommodate various annotation types, significantly expanding the range of usable data. MARIO learns from underutilized datasets by incorporating five forms of supervision: pixel-level, box-level, polygon-level, scribblelevel, and point-level. Each form of supervision is associated with a tailored loss that effectively leverages the supervision labels while minimizing the noise. This allows MARIO to move beyond the constraints of relying on a single annotation type. Furthermore, MARIO primarily utilizes dataset with weak and cheap annotations, reducing the dependence on large-scale, fully annotated ones. Experimental results across five benchmark datasets demonstrate that MARIO consistently outperforms existing methods, highlighting its efficacy in balancing trade-offs between different forms of supervision and maximizing polyp segmentation performance

MARIO: A Mixed Annotation Framework For Polyp Segmentation

TL;DR

MARIO addresses data scarcity in polyp segmentation by introducing mixed supervision to jointly learn from pixel-, polygon-, box-, scribble-, and point-level annotations within a transformer-based framework. It defines dedicated losses for dense (, ), box (mask-to-box, ), scribble (), and point () supervision, and combines them into a total objective, enabling training from heterogeneous data sources. Across eight datasets, MARIO achieves state-of-the-art performance, with a weighted-average Dice of and IoU of , outperforming fully-supervised baselines. This mixed-supervision approach expands usable data, reduces labeling burdens, and supports practical clinical deployment for colorectal polyp screening.

Abstract

Existing polyp segmentation models are limited by high labeling costs and the small size of datasets. Additionally, vast polyp datasets remain underutilized because these models typically rely on a single type of annotation. To address this dilemma, we introduce MARIO, a mixed supervision model designed to accommodate various annotation types, significantly expanding the range of usable data. MARIO learns from underutilized datasets by incorporating five forms of supervision: pixel-level, box-level, polygon-level, scribblelevel, and point-level. Each form of supervision is associated with a tailored loss that effectively leverages the supervision labels while minimizing the noise. This allows MARIO to move beyond the constraints of relying on a single annotation type. Furthermore, MARIO primarily utilizes dataset with weak and cheap annotations, reducing the dependence on large-scale, fully annotated ones. Experimental results across five benchmark datasets demonstrate that MARIO consistently outperforms existing methods, highlighting its efficacy in balancing trade-offs between different forms of supervision and maximizing polyp segmentation performance
Paper Structure (10 sections, 2 equations, 2 figures, 2 tables)

This paper contains 10 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Illustration of our MARIO framework.
  • Figure 2: Visualization results of our MARIO and other comparison methods. "GT" denotes the ground truth.