Table of Contents
Fetching ...

Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video

Renlong Wu, Zhilu Zhang, Mingyang Chen, Zifei Yan, Wangmeng Zuo

TL;DR

This work tackles 4D reconstruction from blurry monocular video by adopting 4D Gaussian Splatting and reframing dynamic representation estimation as exposure-time estimation. It introduces Deblur4DGS, which uses blur-aware variable canonical Gaussians, continuous camera pose and dynamic Gaussian estimation, and a set of regularizations to avoid trivial solutions. The approach yields superior novel-view synthesis and enables deblurring, frame interpolation, and video stabilization on synthetic and real data, often outperforming state-of-the-art 4D methods while maintaining real-time rendering. The method demonstrates robust 4D scene reconstruction in the presence of motion blur and low frame rates, with practical implications for AR/VR pipelines and video enhancement from monocular footage.

Abstract

Recent 4D reconstruction methods have yielded impressive results but rely on sharp videos as supervision. However, motion blur often occurs in videos due to camera shake and object movement, while existing methods render blurry results when using such videos for reconstructing 4D models. Although a few approaches attempted to address the problem, they struggled to produce high-quality results, due to the inaccuracy in estimating continuous dynamic representations within the exposure time. Encouraged by recent works in 3D motion trajectory modeling using 3D Gaussian Splatting (3DGS), we take 3DGS as the scene representation manner, and propose Deblur4DGS to reconstruct a high-quality 4D model from blurry monocular video. Specifically, we transform continuous dynamic representations estimation within an exposure time into the exposure time estimation. Moreover, we introduce the exposure regularization term, multi-frame, and multi-resolution consistency regularization term to avoid trivial solutions. Furthermore, to better represent objects with large motion, we suggest blur-aware variable canonical Gaussians. Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame interpolation, and video stabilization. Extensive experiments in both synthetic and real-world data on the above four tasks show that Deblur4DGS outperforms state-of-the-art 4D reconstruction methods. The codes are available at https://github.com/ZcsrenlongZ/Deblur4DGS.

Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video

TL;DR

This work tackles 4D reconstruction from blurry monocular video by adopting 4D Gaussian Splatting and reframing dynamic representation estimation as exposure-time estimation. It introduces Deblur4DGS, which uses blur-aware variable canonical Gaussians, continuous camera pose and dynamic Gaussian estimation, and a set of regularizations to avoid trivial solutions. The approach yields superior novel-view synthesis and enables deblurring, frame interpolation, and video stabilization on synthetic and real data, often outperforming state-of-the-art 4D methods while maintaining real-time rendering. The method demonstrates robust 4D scene reconstruction in the presence of motion blur and low frame rates, with practical implications for AR/VR pipelines and video enhancement from monocular footage.

Abstract

Recent 4D reconstruction methods have yielded impressive results but rely on sharp videos as supervision. However, motion blur often occurs in videos due to camera shake and object movement, while existing methods render blurry results when using such videos for reconstructing 4D models. Although a few approaches attempted to address the problem, they struggled to produce high-quality results, due to the inaccuracy in estimating continuous dynamic representations within the exposure time. Encouraged by recent works in 3D motion trajectory modeling using 3D Gaussian Splatting (3DGS), we take 3DGS as the scene representation manner, and propose Deblur4DGS to reconstruct a high-quality 4D model from blurry monocular video. Specifically, we transform continuous dynamic representations estimation within an exposure time into the exposure time estimation. Moreover, we introduce the exposure regularization term, multi-frame, and multi-resolution consistency regularization term to avoid trivial solutions. Furthermore, to better represent objects with large motion, we suggest blur-aware variable canonical Gaussians. Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame interpolation, and video stabilization. Extensive experiments in both synthetic and real-world data on the above four tasks show that Deblur4DGS outperforms state-of-the-art 4D reconstruction methods. The codes are available at https://github.com/ZcsrenlongZ/Deblur4DGS.

Paper Structure

This paper contains 24 sections, 20 equations, 10 figures, 17 tables.

Figures (10)

  • Figure 1: (a) Training of Deblur4DGS. When processing $t$-th frame, we first discretize its exposure time into $N$ timestamps. Then, we estimate continuous camera poses $\{\mathbf{P}_{t,i}\}_{i=1}^{N}$ and dynamic Gaussians $\{\mathbf{D}_{t,i}\}_{i=1}^{N}$ within exposure time. Next, we render each latent sharp image $\hat{\mathbf{I}}_{t,i}$ with the camera pose $\mathbf{P}_{t,i}$, dynamic Gaussians $\mathbf{D}_{t,i}$ and static Gaussians $\mathbf{S}$. Finally, $\{\hat{\mathbf{I}}_{t,i}\}_{i=1}^{N}$ are averaged to obtain the synthetic blurry image $\hat{\mathbf{B}}_{t}$, which is used to calculate the reconstruction loss $\mathcal{L}_{rec}$ with the given blurry frame $\mathbf{B}_{t}$. To regularize the under-constrained optimization, we introduce exposure regularization $\mathcal{L}_{e}$, multi-frame consistency regularization $\mathcal{L}_{mfc}$ and multi-resolution consistency regularization $\mathcal{L}_{mrc}$. (b) Rendering of Deblur4DGS. Deblur4DGS produces the sharp image with user-provided timestamp $t$ and camera pose $\mathbf{P}_{t}$.
  • Figure 2: Visual comparisons of novel-view synthesis on real-world videos. Our method produces more photo-realistic details and less visual artifacts in both static and dynamic areas, as marked with yellow and red boxes respectively.
  • Figure 3: Effect of continuous camera pose (ECP) and dynamic Gaussian (EDG) estimation.
  • Figure 4: Structure of camera motion predictor.
  • Figure 5: Effect of regularization terms $\mathcal{L}_{reg}$. $\mathcal{L}_{reg}$ includes exposure regularization $\mathcal{L}_{e}$, multi-frame consistency regularization $\mathcal{L}_{mfc}$, and multi-resolution consistency regularization $\mathcal{L}_{mrc}$.
  • ...and 5 more figures