Table of Contents
Fetching ...

Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

Weihao Gao, Zhuo Deng, Zheng Gong, Lan Ma

TL;DR

This paper tackles the challenge of real-time colonoscopic polyp segmentation on commodity CPUs by introducing UltraSeg, a family of ultra-lightweight networks under a strict parameter budget (<0.3M). Starting from a dermoscopy-validated lightweight backbone, UltraSeg employs architectural refinements, Enhanced Dilated Blocks, and cross-layer lightweight fusion to deliver real-time CPU performance (≈90 FPS) with competitive Dice scores across multiple public datasets, including cross-center and multi-modal settings. A key finding is that, even at extreme compression, careful architectural design can approach 94% of the performance of a 31M-parameter U-Net while using only ~0.1–0.13M parameters, significantly outperforming existing lightweight baselines. The study also reveals the limited efficacy of traditional knowledge distillation in this regime and outlines directions for future work, such as controlled parameter scaling and ultra-light pre-training strategies, to further close the gap to heavyweight models. Overall, UltraSeg offers a practical, deployable solution for real-time, CPU-based colonoscopy in resource-constrained environments and provides a blueprint for ultra-light medical vision in other minimally invasive domains.

Abstract

Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains >94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.

Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

TL;DR

This paper tackles the challenge of real-time colonoscopic polyp segmentation on commodity CPUs by introducing UltraSeg, a family of ultra-lightweight networks under a strict parameter budget (<0.3M). Starting from a dermoscopy-validated lightweight backbone, UltraSeg employs architectural refinements, Enhanced Dilated Blocks, and cross-layer lightweight fusion to deliver real-time CPU performance (≈90 FPS) with competitive Dice scores across multiple public datasets, including cross-center and multi-modal settings. A key finding is that, even at extreme compression, careful architectural design can approach 94% of the performance of a 31M-parameter U-Net while using only ~0.1–0.13M parameters, significantly outperforming existing lightweight baselines. The study also reveals the limited efficacy of traditional knowledge distillation in this regime and outlines directions for future work, such as controlled parameter scaling and ultra-light pre-training strategies, to further close the gap to heavyweight models. Overall, UltraSeg offers a practical, deployable solution for real-time, CPU-based colonoscopy in resource-constrained environments and provides a blueprint for ultra-light medical vision in other minimally invasive domains.

Abstract

Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains >94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.
Paper Structure (22 sections, 9 equations, 5 figures, 7 tables)

This paper contains 22 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The architecture of our proposed methods.
  • Figure 2: Left: Params-Dice trade-off; Right: Single-core FPS-Dice trade-off. Our UltraSeg-Family (red star) simultaneously achieves the best accuracy and superior efficiency.
  • Figure 3: Qualitative comparisons on lightweight models.
  • Figure 4: The figure presents the per-sample mean Dice and the center- or modality-level mean Dice for different models on the PolypDB and PolypGen datasets.
  • Figure 5: Typical cases of significant prediction errors in Ultraseg-130K.