Table of Contents
Fetching ...

HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction

Haoyu Zhao, Xingyue Zhao, Lingting Zhu, Weixi Zheng, Yongchao Xu

TL;DR

HFGS is proposed, a novel approach for deformable endoscopic reconstruction that incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth.

Abstract

Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent trend, offering rapid inference capabilities and superior 3D quality. However, these methods still struggle with under-reconstruction in both static and dynamic scenes. In this paper, we propose HFGS, a novel approach for deformable endoscopic reconstruction that addresses these challenges from spatial and temporal frequency perspectives. Our approach incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth. Additionally, we introduce Temporal High-Frequency Emphasis Reconstruction (THF) to enhance dynamic awareness in neural rendering by leveraging flow priors, focusing optimization on motion-intensive parts. Extensive experiments on two widely used benchmarks demonstrate that HFGS achieves superior rendering quality.

HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction

TL;DR

HFGS is proposed, a novel approach for deformable endoscopic reconstruction that incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth.

Abstract

Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent trend, offering rapid inference capabilities and superior 3D quality. However, these methods still struggle with under-reconstruction in both static and dynamic scenes. In this paper, we propose HFGS, a novel approach for deformable endoscopic reconstruction that addresses these challenges from spatial and temporal frequency perspectives. Our approach incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth. Additionally, we introduce Temporal High-Frequency Emphasis Reconstruction (THF) to enhance dynamic awareness in neural rendering by leveraging flow priors, focusing optimization on motion-intensive parts. Extensive experiments on two widely used benchmarks demonstrate that HFGS achieves superior rendering quality.
Paper Structure (16 sections, 9 equations, 4 figures, 2 tables)

This paper contains 16 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: For the sample image from ENDONERF wang2022neural, (a-c) show the rendered image, the noise power spectrum (NPS) where blue indicates it is closer to GT, and optical flow predictions based on adjacent frame. Our HFGS not only achieves the best results, reconstructing the most detailed information and exhibiting the bluest NPS, but also renders images with optical flow that are closer to the GT.
  • Figure 2: Pipeline of the proposed HFGS. We utilize monocular images, estimated depths from Depth-Anything yang2024depth and tool masks for training huang2024endo. A single MLP is used to derive the deformation associated with these 3D Gaussian, given the features queried via voxel planes. Then we address the under-reconstruction by emphasizing spatial and temporal high-frequency components.
  • Figure 3: Illustration of reconstruction results of previous works and ours on scene "pulling soft tissues" and "cutting tissues twice" on ENDONERF wang2022neural.
  • Figure 4: Ablation on SHF. We show rendering frames w/ and w/o SHF on scene "pulling soft tissues" on ENDONERF wang2022neural.