Table of Contents
Fetching ...

4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes

Jinbo Yan, Rui Peng, Luyang Tang, Ronggang Wang

TL;DR

SaRO-GS introduces a real-time capable dynamic scene representation by uniting 4D Gaussian primitives with a Scale-aware Residual Field and an Adaptive Optimization schedule. By projecting 4D Gaussians into 3D with time-aware attributes and leveraging a scale-aware residual encoding via hex-planes and MipMap levels, the method captures temporally complex phenomena such as appearance/disappearance while maintaining interactive frame rates. The approach demonstrates state-of-the-art rendering quality and speed on monocular and multi-view datasets, achieving up to 182 FPS at 400×400 resolution and superior reconstruction metrics. This work advances dynamic scene rendering by enabling high-fidelity, temporally rich reconstructions suitable for real-time immersive applications, with practical impact in VR/AR pipelines and live synthesis.

Abstract

Reconstructing dynamic scenes from video sequences is a highly promising task in the multimedia domain. While previous methods have made progress, they often struggle with slow rendering and managing temporal complexities such as significant motion and object appearance/disappearance. In this paper, we propose SaRO-GS as a novel dynamic scene representation capable of achieving real-time rendering while effectively handling temporal complexities in dynamic scenes. To address the issue of slow rendering speed, we adopt a Gaussian primitive-based representation and optimize the Gaussians in 4D space, which facilitates real-time rendering with the assistance of 3D Gaussian Splatting. Additionally, to handle temporally complex dynamic scenes, we introduce a Scale-aware Residual Field. This field considers the size information of each Gaussian primitive while encoding its residual feature and aligns with the self-splitting behavior of Gaussian primitives. Furthermore, we propose an Adaptive Optimization Schedule, which assigns different optimization strategies to Gaussian primitives based on their distinct temporal properties, thereby expediting the reconstruction of dynamic regions. Through evaluations on monocular and multi-view datasets, our method has demonstrated state-of-the-art performance. Please see our project page at https://yjb6.github.io/SaRO-GS.github.io.

4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes

TL;DR

SaRO-GS introduces a real-time capable dynamic scene representation by uniting 4D Gaussian primitives with a Scale-aware Residual Field and an Adaptive Optimization schedule. By projecting 4D Gaussians into 3D with time-aware attributes and leveraging a scale-aware residual encoding via hex-planes and MipMap levels, the method captures temporally complex phenomena such as appearance/disappearance while maintaining interactive frame rates. The approach demonstrates state-of-the-art rendering quality and speed on monocular and multi-view datasets, achieving up to 182 FPS at 400×400 resolution and superior reconstruction metrics. This work advances dynamic scene rendering by enabling high-fidelity, temporally rich reconstructions suitable for real-time immersive applications, with practical impact in VR/AR pipelines and live synthesis.

Abstract

Reconstructing dynamic scenes from video sequences is a highly promising task in the multimedia domain. While previous methods have made progress, they often struggle with slow rendering and managing temporal complexities such as significant motion and object appearance/disappearance. In this paper, we propose SaRO-GS as a novel dynamic scene representation capable of achieving real-time rendering while effectively handling temporal complexities in dynamic scenes. To address the issue of slow rendering speed, we adopt a Gaussian primitive-based representation and optimize the Gaussians in 4D space, which facilitates real-time rendering with the assistance of 3D Gaussian Splatting. Additionally, to handle temporally complex dynamic scenes, we introduce a Scale-aware Residual Field. This field considers the size information of each Gaussian primitive while encoding its residual feature and aligns with the self-splitting behavior of Gaussian primitives. Furthermore, we propose an Adaptive Optimization Schedule, which assigns different optimization strategies to Gaussian primitives based on their distinct temporal properties, thereby expediting the reconstruction of dynamic regions. Through evaluations on monocular and multi-view datasets, our method has demonstrated state-of-the-art performance. Please see our project page at https://yjb6.github.io/SaRO-GS.github.io.

Paper Structure

This paper contains 24 sections, 30 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The overall pipeline of SaRO-GS. (a)In 4D space, we simultaneously optimize a set of 4D Gaussians and a scale-aware Residual Field $\mathcal{M}$. When combined with $\mathcal{M}$, each Gaussian generates a residual feature and a lifespan $\sigma$. They both represent the temporal characteristics of the Gaussian primitive. (b)Given a sampling time $t_0$, we can compute the survival status $\gamma(t_0)$ of the Gaussian and decode the residual feature of the Gaussian at time $t_0$ using an MLP, yielding residual of atteibutes. Finally, we combine these residuals with the initial attributes of the Gaussian in 4D space to get the 3D Gaussian representation.(c) Once we obtain the representation of the 3D Gaussian, we can generate rendered images using Gaussian Splatting.
  • Figure 2: The impact of scale is not taken into account in Gaussian self-splitting. (a)When size information is considered, the features of the split Gaussian remain similar to its parent Gaussian. (b)Otherwise, the split Gaussian will have features different from its parent Gaussian
  • Figure 3: Qualitative result on the D-NeRF dataset.
  • Figure 4: Qualitative results on coffee martinis and cut roasted beef from the Plenoptic Video dataset
  • Figure 5: Qualitative results of the ablation study.
  • ...and 3 more figures