Table of Contents
Fetching ...

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, Quoc-Huy Trinh, Koushik Biswas, Hongyi Pan, Ritika K. Jha, Gorkem Durak, Alexander Hann, Jonas Varkey, Hang Viet Dao, Long Van Dao, Binh Phuc Nguyen, Nikolaos Papachrysos, Brandon Rieders, Peter Thelin Schmidt, Enrik Geissler, Tyler Berzin, Pål Halvorsen, Michael A. Riegler, Thomas de Lange, Ulas Bagci

TL;DR

PolypDB addresses the critical need for a large, open, multi-center, multi-modality polyp dataset to improve AI-based detection and segmentation in colonoscopy. It delivers 3,934 annotated polyp images across five imaging modalities from three international centers, with pixel-accurate masks and bounding boxes and rigorous quality control. The work provides modality- and center-wise benchmarks for segmentation and detection, demonstrates federated learning as a privacy-preserving avenue, and analyzes adversarial robustness, highlighting both model strengths and vulnerabilities. The dataset's diversity supports better generalization and clinical relevance, and its public availability is poised to accelerate development of robust CAD systems for colonoscopy; future work will extend to dynamic video data.

Abstract

Colonoscopy is the primary method for examination, detection, and removal of polyps. However, challenges such as variations among the endoscopists' skills, bowel quality preparation, and the complex nature of the large intestine contribute to high polyp miss-rate. These missed polyps can develop into cancer later, underscoring the importance of improving the detection methods. To address this gap of lack of publicly available, multi-center large and diverse datasets for developing automatic methods for polyp detection and segmentation, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos. PolypDB comprises images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) from three medical centers in Norway, Sweden, and Vietnam. We provide a benchmark on each modality and center, including federated learning settings using popular segmentation and detection benchmarks. PolypDB is public and can be downloaded at \url{https://osf.io/pr7ms/}. More information about the dataset, segmentation, detection, federated learning benchmark and train-test split can be found at \url{https://github.com/DebeshJha/PolypDB}.

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

TL;DR

PolypDB addresses the critical need for a large, open, multi-center, multi-modality polyp dataset to improve AI-based detection and segmentation in colonoscopy. It delivers 3,934 annotated polyp images across five imaging modalities from three international centers, with pixel-accurate masks and bounding boxes and rigorous quality control. The work provides modality- and center-wise benchmarks for segmentation and detection, demonstrates federated learning as a privacy-preserving avenue, and analyzes adversarial robustness, highlighting both model strengths and vulnerabilities. The dataset's diversity supports better generalization and clinical relevance, and its public availability is poised to accelerate development of robust CAD systems for colonoscopy; future work will extend to dynamic video data.

Abstract

Colonoscopy is the primary method for examination, detection, and removal of polyps. However, challenges such as variations among the endoscopists' skills, bowel quality preparation, and the complex nature of the large intestine contribute to high polyp miss-rate. These missed polyps can develop into cancer later, underscoring the importance of improving the detection methods. To address this gap of lack of publicly available, multi-center large and diverse datasets for developing automatic methods for polyp detection and segmentation, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos. PolypDB comprises images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) from three medical centers in Norway, Sweden, and Vietnam. We provide a benchmark on each modality and center, including federated learning settings using popular segmentation and detection benchmarks. PolypDB is public and can be downloaded at \url{https://osf.io/pr7ms/}. More information about the dataset, segmentation, detection, federated learning benchmark and train-test split can be found at \url{https://github.com/DebeshJha/PolypDB}.
Paper Structure (26 sections, 3 figures, 6 tables)

This paper contains 26 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Examples of polyps in BLI, FICE, LCI, NBI, and WLI modalities from the PolypDB dataset, showcasing variations in shape, size, color, and appearance. Each image includes polyp bounding boxes and color-coded segmentation masks to show polyp ground truth.
  • Figure 2: Qualitative results for the detection task on the different modalities in the PolypDB dataset.
  • Figure 3: Qualitative results for the different methods across various modalities in the PolypDB dataset.