Table of Contents
Fetching ...

EndoSight AI: Deep Learning-Driven Real-Time Gastrointestinal Polyp Detection and Segmentation for Enhanced Endoscopic Diagnostics

Daniel Cavadia

TL;DR

EndoSight AI tackles the challenge of real-time GI polyp detection and segmentation by integrating a fast YOLOv8 detector with a dedicated U-Net segmentation model. Trained and evaluated on the Hyper-Kvasir dataset, the system achieves substantial performance with mAP@0.5 = $88.3\%$ for detection and Dice = $0.69$ for segmentation, while delivering >$35$ FPS on GPUs. A key contribution is the thermal-aware training protocol, including real-time GPU monitoring, adaptive cooling, and chunked epochs, which enables robust training on consumer hardware. The work demonstrates practical deployment potential in endoscopy workflows, with live demo evaluation, quantitative metrics, and open-source access to models and demonstrations, promoting reproducibility and broader adoption in GI diagnostics.

Abstract

Precise and real-time detection of gastrointestinal polyps during endoscopic procedures is crucial for early diagnosis and prevention of colorectal cancer. This work presents EndoSight AI, a deep learning architecture developed and evaluated independently to enable accurate polyp localization and detailed boundary delineation. Leveraging the publicly available Hyper-Kvasir dataset, the system achieves a mean Average Precision (mAP) of 88.3% for polyp detection and a Dice coefficient of up to 69% for segmentation, alongside real-time inference speeds exceeding 35 frames per second on GPU hardware. The training incorporates clinically relevant performance metrics and a novel thermal-aware procedure to ensure model robustness and efficiency. This integrated AI solution is designed for seamless deployment in endoscopy workflows, promising to advance diagnostic accuracy and clinical decision-making in gastrointestinal healthcare.

EndoSight AI: Deep Learning-Driven Real-Time Gastrointestinal Polyp Detection and Segmentation for Enhanced Endoscopic Diagnostics

TL;DR

EndoSight AI tackles the challenge of real-time GI polyp detection and segmentation by integrating a fast YOLOv8 detector with a dedicated U-Net segmentation model. Trained and evaluated on the Hyper-Kvasir dataset, the system achieves substantial performance with mAP@0.5 = for detection and Dice = for segmentation, while delivering > FPS on GPUs. A key contribution is the thermal-aware training protocol, including real-time GPU monitoring, adaptive cooling, and chunked epochs, which enables robust training on consumer hardware. The work demonstrates practical deployment potential in endoscopy workflows, with live demo evaluation, quantitative metrics, and open-source access to models and demonstrations, promoting reproducibility and broader adoption in GI diagnostics.

Abstract

Precise and real-time detection of gastrointestinal polyps during endoscopic procedures is crucial for early diagnosis and prevention of colorectal cancer. This work presents EndoSight AI, a deep learning architecture developed and evaluated independently to enable accurate polyp localization and detailed boundary delineation. Leveraging the publicly available Hyper-Kvasir dataset, the system achieves a mean Average Precision (mAP) of 88.3% for polyp detection and a Dice coefficient of up to 69% for segmentation, alongside real-time inference speeds exceeding 35 frames per second on GPU hardware. The training incorporates clinically relevant performance metrics and a novel thermal-aware procedure to ensure model robustness and efficiency. This integrated AI solution is designed for seamless deployment in endoscopy workflows, promising to advance diagnostic accuracy and clinical decision-making in gastrointestinal healthcare.

Paper Structure

This paper contains 20 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Representative polyp images from the Hyper-Kvasir dataset demonstrating morphological and size variability
  • Figure 2: Custom U-Net architecture for segmentation, mirroring the five-stage encoder-decoder with skip connections.
  • Figure 3: YOLOv8n architecture for polyp detection. Backbone for feature extraction, neck for fusion, and detection head for bounding box and class prediction.
  • Figure 4: Dice and IoU distribution analysis for U-Net segmentation.
  • Figure 5: Qualitative performance grid of polyp segmentation: original images, ground truth masks, predicted probabilities, and overlay comparisons with Dice/IoU metrics.
  • ...and 3 more figures