HQColon: A Hybrid Interactive Machine Learning Pipeline for High Quality Colon Labeling and Segmentation
Martina Finocchiaro, Ronja Stern, Abraham George Smith, Jens Petersen, Kenny Erleben, Melanie Ganz
TL;DR
HQColon presents a fully automatic, high-resolution colon segmentation method for CT colonography to support digital twins and AI-driven diagnostics. It combines a semi-automatic, expert-validated labeling pipeline with an interactive ML step for fluid pockets and trains four 3D nnU-Net models on raw and masked inputs to segment both air-filled and full colon. Compared with the open-source TotalSegmentator, HQColon delivers substantially higher boundary accuracy (HD95) and surface distance (ASSD), and captures challenging features such as fluid pockets and haustral folds, with a typical inference time around 69 seconds on a high-end GPU. The work provides open-source code and a large, publicly available annotated dataset, reducing labeling effort and enabling broad adoption in research and clinical workflows.
Abstract
High-resolution colon segmentation is crucial for clinical and research applications, such as digital twins and personalized medicine. However, the leading open-source abdominal segmentation tool, TotalSegmentator, struggles with accuracy for the colon, which has a complex and variable shape, requiring time-intensive labeling. Here, we present the first fully automatic high-resolution colon segmentation method. To develop it, we first created a high resolution colon dataset using a pipeline that combines region growing with interactive machine learning to efficiently and accurately label the colon on CT colonography (CTC) images. Based on the generated dataset consisting of 435 labeled CTC images we trained an nnU-Net model for fully automatic colon segmentation. Our fully automatic model achieved an average symmetric surface distance of 0.2 mm (vs. 4.0 mm from TotalSegmentator) and a 95th percentile Hausdorff distance of 1.0 mm (vs. 18 mm from TotalSegmentator). Our segmentation accuracy substantially surpasses TotalSegmentator. We share our trained model and pipeline code, providing the first and only open-source tool for high-resolution colon segmentation. Additionally, we created a large-scale dataset of publicly available high-resolution colon labels.
