Table of Contents
Fetching ...

Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy

Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki

TL;DR

This work targets free-viewpoint synthesis inside the stomach from monocular gastroscopic sequences by applying neural radiance fields (NeRF) and augmenting training with a geometry-based loss that leverages a pre-reconstructed SfM point cloud. A key innovation is generating unobserved views via interpolation between observed viewpoints to mitigate view sparsity, and enforcing geometry consistency on both observed and unobserved views through depth priors and smoothness terms. Experimental results on two gastroscopy sequences show that the proposed method outperforms Zip-NeRF and DS-NeRF in both RGB rendering quality and depth geometry, with ablations confirming the benefit of each geometry-based loss term. The approach enables high-fidelity free-viewpoint visualization in gastroscopy, which can aid diagnosis and intervention, while acknowledging reliance on accurate SfM camera poses and suggesting pose refinement integration as future work, with a project page for results.

Abstract

Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.

Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy

TL;DR

This work targets free-viewpoint synthesis inside the stomach from monocular gastroscopic sequences by applying neural radiance fields (NeRF) and augmenting training with a geometry-based loss that leverages a pre-reconstructed SfM point cloud. A key innovation is generating unobserved views via interpolation between observed viewpoints to mitigate view sparsity, and enforcing geometry consistency on both observed and unobserved views through depth priors and smoothness terms. Experimental results on two gastroscopy sequences show that the proposed method outperforms Zip-NeRF and DS-NeRF in both RGB rendering quality and depth geometry, with ablations confirming the benefit of each geometry-based loss term. The approach enables high-fidelity free-viewpoint visualization in gastroscopy, which can aid diagnosis and intervention, while acknowledging reliance on accurate SfM camera poses and suggesting pose refinement integration as future work, with a project page for results.

Abstract

Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.
Paper Structure (12 sections, 9 equations, 5 figures, 2 tables)

This paper contains 12 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overall process flow. Using a real monocular gastroscopic image sequence, we first apply structure-from-motion (SfM) to obtain camera poses and a reconstructed point cloud. Then, we train neural radiance fields (NeRF) of the stomach, where we propose a novel geometry-based loss exploiting the point cloud from SfM. In the application phase, RGB images and depth maps of novel views can be synthesized through volume rendering of NeRF.
  • Figure 2: The overview of our proposed NeRF method. As a standard NeRF method, we train a network $F_{\Theta}$ to estimate the color $\bf{c}$ and the density $\sigma$ given the 3D point coordinate $\bf{x}$ and the viewing direction $\bf{d}$ as inputs. The key of our method is twofold: 1) We generate unobserved views by interpolating consecutive observed views to address view sparsity and 2) we apply a geometry-based loss for both observed and unobserved views to effectively constrain the learned geometry by using the point cloud reconstructed by SfM. The technical details are in the methodology section.
  • Figure 3: Rendering results for a novel camera trajectory. The camera trajectory in red color represents a real gastroscope trajectory, which was used for training NeRF. The camera trajectory in blue color represents a novel trajectory for the view synthesis application.
  • Figure 4: The qualitative comparisons of rendered RGB images. The top and the second rows show the results for two different viewpoints in the testing images.
  • Figure 5: The qualitative comparisons of rendered depth maps. For each method, we present both the rendered depth map (first row) and the corresponding point cloud (second row) for better visualization.