Table of Contents
Fetching ...

Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction

Diwen Wan, Ruijie Lu, Gang Zeng

TL;DR

This work introduces Superpoint Gaussian Splatting (SP-GS), a dynamic scene representation that extends 3D Gaussian Splatting by clustering neighboring Gaussians into learnable superpoints and predicting their time-varying rigid deformations with a lightweight network $\mathcal{F}$. By enforcing As-Rigid-As-Possible consistency via a property reconstruction loss and a learnable association between Gaussians and superpoints, SP-GS achieves real-time rendering speeds comparable to static 3D-GS while maintaining high visual fidelity on synthetic and real datasets. The framework also supports optional non-rigid refinement $\mathcal{G}$ and enables downstream tasks such as model distillation, pose estimation, and scene editing, enhancing practicality and versatility. Empirical results show SP-GS reaches up to $227$ FPS at $800\times800$ with competitive image quality on D-NeRF, and superior FPS with strong visual quality on HyperNeRF and NeRF-DS, demonstrating its impact for fast, interactive dynamic scene reconstruction and editing.

Abstract

Rendering novel view images in dynamic scenes is a crucial yet challenging task. Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS). Specifically, our framework first employs explicit 3D Gaussians to reconstruct the scene and then clusters Gaussians with similar properties (e.g., rotation, translation, and location) into superpoints. Empowered by these superpoints, our method manages to extend 3D Gaussian splatting to dynamic scenes with only a slight increase in computational expense. Apart from achieving state-of-the-art visual quality and real-time rendering under high resolutions, the superpoint representation provides a stronger manipulation capability. Extensive experiments demonstrate the practicality and effectiveness of our approach on both synthetic and real-world datasets. Please see our project page at https://dnvtmf.github.io/SP_GS.github.io.

Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction

TL;DR

This work introduces Superpoint Gaussian Splatting (SP-GS), a dynamic scene representation that extends 3D Gaussian Splatting by clustering neighboring Gaussians into learnable superpoints and predicting their time-varying rigid deformations with a lightweight network . By enforcing As-Rigid-As-Possible consistency via a property reconstruction loss and a learnable association between Gaussians and superpoints, SP-GS achieves real-time rendering speeds comparable to static 3D-GS while maintaining high visual fidelity on synthetic and real datasets. The framework also supports optional non-rigid refinement and enables downstream tasks such as model distillation, pose estimation, and scene editing, enhancing practicality and versatility. Empirical results show SP-GS reaches up to FPS at with competitive image quality on D-NeRF, and superior FPS with strong visual quality on HyperNeRF and NeRF-DS, demonstrating its impact for fast, interactive dynamic scene reconstruction and editing.

Abstract

Rendering novel view images in dynamic scenes is a crucial yet challenging task. Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS). Specifically, our framework first employs explicit 3D Gaussians to reconstruct the scene and then clusters Gaussians with similar properties (e.g., rotation, translation, and location) into superpoints. Empowered by these superpoints, our method manages to extend 3D Gaussian splatting to dynamic scenes with only a slight increase in computational expense. Apart from achieving state-of-the-art visual quality and real-time rendering under high resolutions, the superpoint representation provides a stronger manipulation capability. Extensive experiments demonstrate the practicality and effectiveness of our approach on both synthetic and real-world datasets. Please see our project page at https://dnvtmf.github.io/SP_GS.github.io.
Paper Structure (34 sections, 16 equations, 7 figures, 19 tables)

This paper contains 34 sections, 16 equations, 7 figures, 19 tables.

Figures (7)

  • Figure 1: Overview of our pipeline. We initialize the 3D Gaussians with point clouds reconstructed from SfM. Then we aggregate the 3D Gaussians into superpoints, and predict the deformation for every 3D Gaussian at a given timestep. The image is rendered using the differentiable Gaussian rasterization on the deformed 3D Gaussians. Additionally, an optional non-rigid deformation network can be used to further improve the performance.
  • Figure 2: Qualitative comparisons of baselines and our method on D-NeRF D-NeRF.
  • Figure 3: Qualitative comparisons of baselines and our method on NeRF-DS dataset NeRF-DS.
  • Figure 4: Qualitative comparisons of baselines and our method on HyperNeRF datasetHyperNeRF.
  • Figure 5: We visualize 3D Gaussians and superpoints by simply coloring the correspondent points. (a) the rendered image. (b) 3D Gaussians with its original color. (c) 3D Gaussians colored by superpoints, which means 3D Gaussians in one superpoint will be the same color. (d) the superpoints.
  • ...and 2 more figures