Table of Contents
Fetching ...

Reliable Mislabel Detection for Video Capsule Endoscopy Data

Julia Werner, Julius Oexle, Oliver Bause, Maxime Le Floch, Franz Brinkmann, Hannah Tolle, Jochen Hampe, Oliver Bringmann

TL;DR

This work tackles mislabeled data in Video Capsule Endoscopy by introducing a mislabel-detection and cleaning pipeline that uses a three-CNN/GMM workflow to estimate per-sample noise, correct or filter suspected labels, and retrain CNNs for improved anomaly detection. The methodology is evaluated in two stages: controlled noise injection on Kvasir-Capsule and real-world cleaning on the Galar dataset with clinical validation by gastroenterologists, achieving marked improvements over uncleaned data and existing baselines. Key results include a final dev-set accuracy of $93.83\%$ and F1 of $71.58\%$ on Galar after cleaning, and a Precision@100 of $78\%$ in clinician validation, underscoring practical potential for dataset cleaning prior to training. The work provides a concrete, verifiable path toward more reliable medical image classification and, potentially, on-device anomaly detection in VCE systems.

Abstract

The classification performance of deep neural networks relies strongly on access to large, accurately annotated datasets. In medical imaging, however, obtaining such datasets is particularly challenging since annotations must be provided by specialized physicians, which severely limits the pool of annotators. Furthermore, class boundaries can often be ambiguous or difficult to define which further complicates machine learning-based classification. In this paper, we want to address this problem and introduce a framework for mislabel detection in medical datasets. This is validated on the two largest, publicly available datasets for Video Capsule Endoscopy, an important imaging procedure for examining the gastrointestinal tract based on a video stream of lowresolution images. In addition, potentially mislabeled samples identified by our pipeline were reviewed and re-annotated by three experienced gastroenterologists. Our results show that the proposed framework successfully detects incorrectly labeled data and results in an improved anomaly detection performance after cleaning the datasets compared to current baselines.

Reliable Mislabel Detection for Video Capsule Endoscopy Data

TL;DR

This work tackles mislabeled data in Video Capsule Endoscopy by introducing a mislabel-detection and cleaning pipeline that uses a three-CNN/GMM workflow to estimate per-sample noise, correct or filter suspected labels, and retrain CNNs for improved anomaly detection. The methodology is evaluated in two stages: controlled noise injection on Kvasir-Capsule and real-world cleaning on the Galar dataset with clinical validation by gastroenterologists, achieving marked improvements over uncleaned data and existing baselines. Key results include a final dev-set accuracy of and F1 of on Galar after cleaning, and a Precision@100 of in clinician validation, underscoring practical potential for dataset cleaning prior to training. The work provides a concrete, verifiable path toward more reliable medical image classification and, potentially, on-device anomaly detection in VCE systems.

Abstract

The classification performance of deep neural networks relies strongly on access to large, accurately annotated datasets. In medical imaging, however, obtaining such datasets is particularly challenging since annotations must be provided by specialized physicians, which severely limits the pool of annotators. Furthermore, class boundaries can often be ambiguous or difficult to define which further complicates machine learning-based classification. In this paper, we want to address this problem and introduce a framework for mislabel detection in medical datasets. This is validated on the two largest, publicly available datasets for Video Capsule Endoscopy, an important imaging procedure for examining the gastrointestinal tract based on a video stream of lowresolution images. In addition, potentially mislabeled samples identified by our pipeline were reviewed and re-annotated by three experienced gastroenterologists. Our results show that the proposed framework successfully detects incorrectly labeled data and results in an improved anomaly detection performance after cleaning the datasets compared to current baselines.
Paper Structure (11 sections, 5 figures, 2 tables)

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Experimental design: 1) controlled experiment on the Kvasir-Capsule dataset to identify injected noise and 2) mislabel detection on the Galar dataset involving verification by scientific panel of three gastroenterologists with subsequent anomaly detection.
  • Figure 2: Mislabel correction pipeline: 1. uncleaned dataset, 2. three CNN trainings with subsequent GMM training, 3. correction of $k^c$ labels based on noise reduction, 4. new training to assess the noise probability, 5. filtering $k^f$ mislabeled data samples.
  • Figure 3: Distribution of the loss values with the GMM total distribution and the individual components (Kvasir-Capsule dataset, 5% noise injection, correction step).
  • Figure 4: tSNE visualization of the latent representations before (first plot) and after (second plot) mislabel detection with corrected samples indicated (dark blue: anomaly $\xrightarrow{}$ normal, black: normal $\xrightarrow{}$ anomaly).
  • Figure 5: Representative VCE images, that were identified as mislabeled by our pipeline and re-annotated by clinical experts (I: Initial, R: Revised).