ColonNeRF: High-Fidelity Neural Reconstruction of Long Colonoscopy
Yufei Shi, Beijia Lu, Jia-Wei Liu, Ming Li, Mike Zheng Shou
TL;DR
Colorectal cancer diagnosis can be greatly aided by accurate 3D reconstruction of long colonoscopy sequences, but challenges arise from segment dissimilarity, mixed geometry, and sparse camera views. The authors propose ColonNeRF, a NeRF based reconstruction framework with three core components: Region Division Module for dividing the colon into overlapping blocks, a Multi-Level Fusion Module that progressively models geometry from coarse to fine, and a DensiNet driven pose densification module guided by semantic consistency. A Region Integration Module then fuses the block reconstructions while filtering out unreliable blocks and ensuring smooth transitions, optimizing a composite objective across depth, pose, and semantic alignment. Evaluations on synthetic SimCol-to-3D 2022 and real C3VD Descending Colon datasets show substantial improvements in perceptual and depth metrics, with clearer textures and more accurate geometry than state-of-the-art NeRF methods. Overall, ColonNeRF demonstrates strong potential for clinically useful 3D colon reconstruction and planning.
Abstract
Colonoscopy reconstruction is pivotal for diagnosing colorectal cancer. However, accurate long-sequence colonoscopy reconstruction faces three major challenges: (1) dissimilarity among segments of the colon due to its meandering and convoluted shape; (2) co-existence of simple and intricately folded geometry structures; (3) sparse viewpoints due to constrained camera trajectories. To tackle these challenges, we introduce a new reconstruction framework based on neural radiance field (NeRF), named ColonNeRF, which leverages neural rendering for novel view synthesis of long-sequence colonoscopy. Specifically, to reconstruct the entire colon in a piecewise manner, our ColonNeRF introduces a region division and integration module, effectively reducing shape dissimilarity and ensuring geometric consistency in each segment. To learn both the simple and complex geometry in a unified framework, our ColonNeRF incorporates a multi-level fusion module that progressively models the colon regions from easy to hard. Additionally, to overcome the challenges from sparse views, we devise a DensiNet module for densifying camera poses under the guidance of semantic consistency. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our ColonNeRF. Quantitatively, ColonNeRF exhibits a 67%-85% increase in LPIPS-ALEX scores. Qualitatively, our reconstruction visualizations show much clearer textures and more accurate geometric details. These sufficiently demonstrate our superior performance over the state-of-the-art methods.
