Table of Contents
Fetching ...

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim

TL;DR

SplineGS tackles real-time dynamic 3D view synthesis from monocular video without COLMAP by introducing Motion-Adaptive Spline (MAS) trajectories for dynamic 3D Gaussians and Motion-Adaptive Control point Pruning (MACP) to adapt complexity to motion. It jointly optimizes camera parameters and 3D Gaussian attributes in a two-stage process, guided by photometric and geometric consistency, and leverages a COLMAP-free setup to achieve state-of-the-art rendering quality with vastly faster rendering speeds. The framework demonstrates strong performance on challenging in-the-wild sequences, outperforming COLMAP-based and COLMAP-free baselines in NVS quality while delivering real-time or near real-time rendering. The combination of MAS, MACP, and joint optimization provides a robust, efficient pathway for dynamic NVS without external camera parameter priors, with practical impact for AR/VR and content creation in unconstrained environments.

Abstract

Synthesizing novel views from in-the-wild monocular videos is challenging due to scene dynamics and the lack of multi-view cues. To address this, we propose SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos. At its core is a novel Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian trajectories using cubic Hermite splines with a small number of control points. For MAS, we introduce a Motion-Adaptive Control points Pruning (MACP) method to model the deformation of each dynamic 3D Gaussian across varying motions, progressively pruning control points while maintaining dynamic modeling integrity. Additionally, we present a joint optimization strategy for camera parameter estimation and 3D Gaussian attributes, leveraging photometric and geometric consistency. This eliminates the need for Structure-from-Motion preprocessing and enhances SplineGS's robustness in real-world conditions. Experiments show that SplineGS significantly outperforms state-of-the-art methods in novel view synthesis quality for dynamic scenes from monocular videos, achieving thousands times faster rendering speed.

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

TL;DR

SplineGS tackles real-time dynamic 3D view synthesis from monocular video without COLMAP by introducing Motion-Adaptive Spline (MAS) trajectories for dynamic 3D Gaussians and Motion-Adaptive Control point Pruning (MACP) to adapt complexity to motion. It jointly optimizes camera parameters and 3D Gaussian attributes in a two-stage process, guided by photometric and geometric consistency, and leverages a COLMAP-free setup to achieve state-of-the-art rendering quality with vastly faster rendering speeds. The framework demonstrates strong performance on challenging in-the-wild sequences, outperforming COLMAP-based and COLMAP-free baselines in NVS quality while delivering real-time or near real-time rendering. The combination of MAS, MACP, and joint optimization provides a robust, efficient pathway for dynamic NVS without external camera parameter priors, with practical impact for AR/VR and content creation in unconstrained environments.

Abstract

Synthesizing novel views from in-the-wild monocular videos is challenging due to scene dynamics and the lack of multi-view cues. To address this, we propose SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos. At its core is a novel Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian trajectories using cubic Hermite splines with a small number of control points. For MAS, we introduce a Motion-Adaptive Control points Pruning (MACP) method to model the deformation of each dynamic 3D Gaussian across varying motions, progressively pruning control points while maintaining dynamic modeling integrity. Additionally, we present a joint optimization strategy for camera parameter estimation and 3D Gaussian attributes, leveraging photometric and geometric consistency. This eliminates the need for Structure-from-Motion preprocessing and enhances SplineGS's robustness in real-world conditions. Experiments show that SplineGS significantly outperforms state-of-the-art methods in novel view synthesis quality for dynamic scenes from monocular videos, achieving thousands times faster rendering speed.

Paper Structure

This paper contains 22 sections, 22 equations, 20 figures, 7 tables.

Figures (20)

  • Figure 1: Our SplineGS achieves state-of-the-art rendering quality with fast rendering speed for novel spatio-temporal view synthesis from monocular videos without relying on pre-computed camera parameters. (a) We use our predicted camera parameters for yang2023deformable3dgsLi_STG_2024_CVPR since COLMAP schonberger2016structure is unable to provide reasonable camera parameters for most scenes in the DAVIS dataset ponttuset20182017davischallengevideo. (b) SplineGS achieves 1.1 dB higher PSNR and 8,000$\times$ faster rendering speed compared to the second-best method on the NVIDIA dataset yoon2020dynamic.
  • Figure 2: Overview of SplineGS. Our SplineGS leverages spline-based functions to model the deformation of dynamic 3D Gaussians with a novel Motion-Adaptive Spline (MAS) architecture. It is composed of sets of learnable control points based on a cubic Hermite spline function ahlberg2016theoryde1978practical to accurately model the trajectory of each dynamic 3D Gaussian and to achieve faster rendering speed. To avoid any preprocessing of camera parameters, i.e. COLMAP-free, we adopt a two-stage optimization: warm-up and main training stages.
  • Figure 3: Visual comparisons for novel view synthesis on the NVIDIA dataset.
  • Figure 4: Visual comparisons for novel view synthesis on the DAVIS dataset.
  • Figure 5: Visual comparisons for novel view and time synthesis on the NVIDIA dataset.
  • ...and 15 more figures