Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy
Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki
TL;DR
This work targets free-viewpoint synthesis inside the stomach from monocular gastroscopic sequences by applying neural radiance fields (NeRF) and augmenting training with a geometry-based loss that leverages a pre-reconstructed SfM point cloud. A key innovation is generating unobserved views via interpolation between observed viewpoints to mitigate view sparsity, and enforcing geometry consistency on both observed and unobserved views through depth priors and smoothness terms. Experimental results on two gastroscopy sequences show that the proposed method outperforms Zip-NeRF and DS-NeRF in both RGB rendering quality and depth geometry, with ablations confirming the benefit of each geometry-based loss term. The approach enables high-fidelity free-viewpoint visualization in gastroscopy, which can aid diagnosis and intervention, while acknowledging reliance on accurate SfM camera poses and suggesting pose refinement integration as future work, with a project page for results.
Abstract
Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.
