Table of Contents
Fetching ...

FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos

Florian Philipp Stilz, Mert Asim Karaoglu, Felix Tristram, Nassir Navab, Benjamin Busam, Alexander Ladikos

TL;DR

This work proposes an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch that improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more.

Abstract

Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. Neural rendering has recently shown promising results in endoscopic reconstruction with deforming tissue. However, the setup has been restricted to a static endoscope, limited deformation, or required an external tracking device to retrieve camera pose information of the endoscopic camera. With FLex we adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information. Extensive evaluations on the StereoMIS dataset show that FLex significantly improves the quality of novel view synthesis while maintaining competitive pose accuracy.

FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos

TL;DR

This work proposes an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch that improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more.

Abstract

Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. Neural rendering has recently shown promising results in endoscopic reconstruction with deforming tissue. However, the setup has been restricted to a static endoscope, limited deformation, or required an external tracking device to retrieve camera pose information of the endoscopic camera. With FLex we adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information. Extensive evaluations on the StereoMIS dataset show that FLex significantly improves the quality of novel view synthesis while maintaining competitive pose accuracy.
Paper Structure (13 sections, 4 equations, 3 figures, 5 tables)

This paper contains 13 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Joint pose and radiance fields optimization. $k$ indexes the frames along the temporal dimension. Orange and green arrows show the flow of inputs and outputs in the optimization processes of pose and radiance fields.
  • Figure 2: Joint progressive pose and local dynamic radiance fields optimization. Spatial extents clustered within the bounding boxes of different colors represent the spatio-temporal domain of the corresponding local radiance fields. The arrow on the camera trajectory shows the temporal direction.
  • Figure 3: Qualitative results on a 1,000 frame scene with breathing deformations and camera motion.