Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture
Weihao Gao, Zhuo Deng, Zheng Gong, Lan Ma
TL;DR
This paper tackles the challenge of real-time colonoscopic polyp segmentation on commodity CPUs by introducing UltraSeg, a family of ultra-lightweight networks under a strict parameter budget (<0.3M). Starting from a dermoscopy-validated lightweight backbone, UltraSeg employs architectural refinements, Enhanced Dilated Blocks, and cross-layer lightweight fusion to deliver real-time CPU performance (≈90 FPS) with competitive Dice scores across multiple public datasets, including cross-center and multi-modal settings. A key finding is that, even at extreme compression, careful architectural design can approach 94% of the performance of a 31M-parameter U-Net while using only ~0.1–0.13M parameters, significantly outperforming existing lightweight baselines. The study also reveals the limited efficacy of traditional knowledge distillation in this regime and outlines directions for future work, such as controlled parameter scaling and ultra-light pre-training strategies, to further close the gap to heavyweight models. Overall, UltraSeg offers a practical, deployable solution for real-time, CPU-based colonoscopy in resource-constrained environments and provides a blueprint for ultra-light medical vision in other minimally invasive domains.
Abstract
Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains >94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.
