Table of Contents
Fetching ...

Free-DyGS: Camera-Pose-Free Scene Reconstruction for Dynamic Surgical Videos with Gaussian Splatting

Qian Li, Shuojue Yang, Daiyun Shen, Jimmy Bok Yan So, Jing Qin, Yueming Jin

TL;DR

Free-DyGS tackles the challenging problem of dynamic surgical scene reconstruction from endoscopic videos with unknown camera poses and tissue deformations. It introduces a Gaussian-Splatting (GS) based pipeline that initializes from a pre-trained Sparse Gaussian Regressor (SGR) and progressively expands the scene while jointly estimating deformations and $6$D camera poses in a frame-by-frame manner, aided by a Retrospective Deformation Recapitulation (RDR) strategy. Core innovations include the Generalizable Gaussian parameterization via SGR for fast initialization and scene expansion, a Partially Activated Flexible Deformation Model (PFDM) to reduce temporal coupling, and retrospective learning to preserve historical deformations across a length-$4$D sequence. Experiments on StereoMIS and Hamlyn show higher rendering fidelity and reduced training times compared with state-of-the-art methods, indicating strong potential for intraoperative navigation and surgical education.

Abstract

High-fidelity reconstruction of surgical scene is a fundamentally crucial task to support many applications, such as intra-operative navigation and surgical education. However, most existing methods assume the ideal surgical scenarios - either focus on dynamic reconstruction with deforming tissue yet assuming a given fixed camera pose, or allow endoscope movement yet reconstructing the static scenes. In this paper, we target at a more realistic yet challenging setup - free-pose reconstruction with a moving camera for highly dynamic surgical scenes. Meanwhile, we take the first step to introduce Gaussian Splitting (GS) technique to tackle this challenging setting and propose a novel GS-based framework for fast reconstruction, termed \textit{Free-DyGS}. Concretely, our model embraces a novel scene initialization in which a pre-trained Sparse Gaussian Regressor (SGR) can efficiently parameterize the initial attributes. For each subsequent frame, we propose to jointly optimize the deformation model and 6D camera poses in a frame-by-frame manner, easing training given the limited deformation differences between consecutive frames. A Scene Expansion scheme is followed to expand the GS model for the unseen regions introduced by the moving camera. Moreover, the framework is equipped with a novel Retrospective Deformation Recapitulation (RDR) strategy to preserve the entire-clip deformations throughout the frame-by-frame training scheme. The efficacy of the proposed Free-DyGS is substantiated through extensive experiments on two datasets: StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses other advanced methods in both rendering accuracy and efficiency. Code will be available.

Free-DyGS: Camera-Pose-Free Scene Reconstruction for Dynamic Surgical Videos with Gaussian Splatting

TL;DR

Free-DyGS tackles the challenging problem of dynamic surgical scene reconstruction from endoscopic videos with unknown camera poses and tissue deformations. It introduces a Gaussian-Splatting (GS) based pipeline that initializes from a pre-trained Sparse Gaussian Regressor (SGR) and progressively expands the scene while jointly estimating deformations and D camera poses in a frame-by-frame manner, aided by a Retrospective Deformation Recapitulation (RDR) strategy. Core innovations include the Generalizable Gaussian parameterization via SGR for fast initialization and scene expansion, a Partially Activated Flexible Deformation Model (PFDM) to reduce temporal coupling, and retrospective learning to preserve historical deformations across a length-D sequence. Experiments on StereoMIS and Hamlyn show higher rendering fidelity and reduced training times compared with state-of-the-art methods, indicating strong potential for intraoperative navigation and surgical education.

Abstract

High-fidelity reconstruction of surgical scene is a fundamentally crucial task to support many applications, such as intra-operative navigation and surgical education. However, most existing methods assume the ideal surgical scenarios - either focus on dynamic reconstruction with deforming tissue yet assuming a given fixed camera pose, or allow endoscope movement yet reconstructing the static scenes. In this paper, we target at a more realistic yet challenging setup - free-pose reconstruction with a moving camera for highly dynamic surgical scenes. Meanwhile, we take the first step to introduce Gaussian Splitting (GS) technique to tackle this challenging setting and propose a novel GS-based framework for fast reconstruction, termed \textit{Free-DyGS}. Concretely, our model embraces a novel scene initialization in which a pre-trained Sparse Gaussian Regressor (SGR) can efficiently parameterize the initial attributes. For each subsequent frame, we propose to jointly optimize the deformation model and 6D camera poses in a frame-by-frame manner, easing training given the limited deformation differences between consecutive frames. A Scene Expansion scheme is followed to expand the GS model for the unseen regions introduced by the moving camera. Moreover, the framework is equipped with a novel Retrospective Deformation Recapitulation (RDR) strategy to preserve the entire-clip deformations throughout the frame-by-frame training scheme. The efficacy of the proposed Free-DyGS is substantiated through extensive experiments on two datasets: StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses other advanced methods in both rendering accuracy and efficiency. Code will be available.
Paper Structure (22 sections, 17 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 17 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Point clouds (left) reconstructed by our approach from the P2_7 video of the StereoMIS dataset with camera trajectory estimation (green line) and rendered images. Images within the colored frames (top right) illustrate the renderings captured under various camera poses. Those within the gradient blue frames (bottom right) displays the renderings of the dynamic scene at different times from a fixed camera.
  • Figure 2: Illustration of our Free-DyGS framework which contains four main phases: (a) Scene Initialization, (b) Joint Learning, (c) Scene Expansion, and (d) Retrospective Learning.
  • Figure 3: Generalizable Gaussian parameterization module. The SGR is designed to perform generalizable Gaussian parameterization, where pixel-aligned Gaussian attributes ($\alpha$, $\mathbf{s}$, $\mathbf{r}$) and correction terms ($\Delta D$, $\Delta C$) are predicted from the input frame consisting of an RGB image $I$ and a depth map $D$.
  • Figure 4: Illustration of the PFDM. Deformation functions $\Phi(t)$ are defined to represent the attributes deviation from the canonical values over time. Each one is articulated as an accumulation of Gaussian basis functions $\{\varphi_j(t)\}$. Only partial basis functions are activated and their parameters are optimized during training.
  • Figure 5: Qualitative comparisons of different methods on tipical frames from both StreoeMIS and Hamlyn datasets. We show the rendering PSNR in the image.
  • ...and 1 more figures