Table of Contents
Fetching ...

Divide and Conquer: Grounding a Bleeding Areas in Gastrointestinal Image with Two-Stage Model

Yu-Fan Lin, Bo-Cheng Qiu, Chia-Ming Lee, Chih-Chung Hsu

TL;DR

The paper tackles GI bleeding detection in wireless capsule endoscopy by decoupling classification from grounding in a two-stage framework. It employs SWA and TTA to improve generalization and robustness to domain shifts, and uses an affirmative ensemble to refine bounding boxes from two grounding models. On the Auto-WCEBleedGen V2 dataset, the approach achieves second place, with notably stronger performance on sequential (consistent) data than on heterogeneous non-sequential data. The work demonstrates the practical value of task decoupling and ensemble techniques for robust medical image grounding and highlights potential clinical benefits in GI endoscopy analysis.

Abstract

Accurate detection and segmentation of gastrointestinal bleeding are critical for diagnosing diseases such as peptic ulcers and colorectal cancer. This study proposes a two-stage framework that decouples classification and grounding to address the inherent challenges posed by traditional Multi-Task Learning models, which jointly optimizes classification and segmentation. Our approach separates these tasks to achieve targeted optimization for each. The model first classifies images as bleeding or non-bleeding, thereby isolating subsequent grounding from inter-task interference and label heterogeneity. To further enhance performance, we incorporate Stochastic Weight Averaging and Test-Time Augmentation, which improve model robustness against domain shifts and annotation inconsistencies. Our method is validated on the Auto-WCEBleedGen Challenge V2 Challenge dataset and achieving second place. Experimental results demonstrate significant improvements in classification accuracy and segmentation precision, especially on sequential datasets with consistent visual patterns. This study highlights the practical benefits of a two-stage strategy for medical image analysis and sets a new standard for GI bleeding detection and segmentation. Our code is publicly available at this GitHub repository.

Divide and Conquer: Grounding a Bleeding Areas in Gastrointestinal Image with Two-Stage Model

TL;DR

The paper tackles GI bleeding detection in wireless capsule endoscopy by decoupling classification from grounding in a two-stage framework. It employs SWA and TTA to improve generalization and robustness to domain shifts, and uses an affirmative ensemble to refine bounding boxes from two grounding models. On the Auto-WCEBleedGen V2 dataset, the approach achieves second place, with notably stronger performance on sequential (consistent) data than on heterogeneous non-sequential data. The work demonstrates the practical value of task decoupling and ensemble techniques for robust medical image grounding and highlights potential clinical benefits in GI endoscopy analysis.

Abstract

Accurate detection and segmentation of gastrointestinal bleeding are critical for diagnosing diseases such as peptic ulcers and colorectal cancer. This study proposes a two-stage framework that decouples classification and grounding to address the inherent challenges posed by traditional Multi-Task Learning models, which jointly optimizes classification and segmentation. Our approach separates these tasks to achieve targeted optimization for each. The model first classifies images as bleeding or non-bleeding, thereby isolating subsequent grounding from inter-task interference and label heterogeneity. To further enhance performance, we incorporate Stochastic Weight Averaging and Test-Time Augmentation, which improve model robustness against domain shifts and annotation inconsistencies. Our method is validated on the Auto-WCEBleedGen Challenge V2 Challenge dataset and achieving second place. Experimental results demonstrate significant improvements in classification accuracy and segmentation precision, especially on sequential datasets with consistent visual patterns. This study highlights the practical benefits of a two-stage strategy for medical image analysis and sets a new standard for GI bleeding detection and segmentation. Our code is publicly available at this GitHub repository.

Paper Structure

This paper contains 10 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Our proposed method consists of a classification model and two instance segmentation models during the training phase (top figure), which are combined during the inference phase (bottom figure). In the training stage, the classification model is trained to differentiate between bleeding and non-bleeding images, with the identified bleeding images used to train the two instance segmentation models. SWA is applied during the final ten epochs of training for each model to improve model stability and generalization. During the inference phase, the classification model is enhanced with TTA and sequentially integrated with the instance segmentation models to form an optimized pipeline for robust performance. The red text indicates the final adopted output.
  • Figure 2: The illustration of affirmative ensemble.
  • Figure 3: The Eigen-CAM jacobgilpytorchcam visualization of the bleeding area.