Table of Contents
Fetching ...

Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

Xuankai Zhang, Junjin Xiao, Qing Zhang

TL;DR

This work tackles high-quality novel-view synthesis for dynamic scenes captured by monocular videos that suffer from both defocus and motion blur. It introduces a unified framework that models blur via per-pixel kernels and jointly optimizes a sharp dynamic Gaussian scene representation, incorporating learnable SE(3) motion bases and a dynamic Gaussian densification strategy. A Blur Prediction Network (BP-Net) predicts per-pixel blur kernels and intensities, while a blur-aware sparsity constraint and unseen-view supervision guide stable optimization. Empirical results demonstrate state-of-the-art photorealistic novel-view synthesis on defocused and motion-blurred footage, with efficient rendering and public code release, underscoring the practical impact for 3D content creation from blurry monocular videos.

Abstract

This paper presents a unified framework that allows high-quality dynamic Gaussian Splatting from both defocused and motion-blurred monocular videos. Due to the significant difference between the formation processes of defocus blur and motion blur, existing methods are tailored for either one of them, lacking the ability to simultaneously deal with both of them. Although the two can be jointly modeled as blur kernel-based convolution, the inherent difficulty in estimating accurate blur kernels greatly limits the progress in this direction. In this work, we go a step further towards this direction. Particularly, we propose to estimate per-pixel reliable blur kernels using a blur prediction network that exploits blur-related scene and camera information and is subject to a blur-aware sparsity constraint. Besides, we introduce a dynamic Gaussian densification strategy to mitigate the lack of Gaussians for incomplete regions, and boost the performance of novel view synthesis by incorporating unseen view information to constrain scene optimization. Extensive experiments show that our method outperforms the state-of-the-art methods in generating photorealistic novel view synthesis from defocused and motion-blurred monocular videos. Our code is available at https://github.com/hhhddddddd/dydeblur.

Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

TL;DR

This work tackles high-quality novel-view synthesis for dynamic scenes captured by monocular videos that suffer from both defocus and motion blur. It introduces a unified framework that models blur via per-pixel kernels and jointly optimizes a sharp dynamic Gaussian scene representation, incorporating learnable SE(3) motion bases and a dynamic Gaussian densification strategy. A Blur Prediction Network (BP-Net) predicts per-pixel blur kernels and intensities, while a blur-aware sparsity constraint and unseen-view supervision guide stable optimization. Empirical results demonstrate state-of-the-art photorealistic novel-view synthesis on defocused and motion-blurred footage, with efficient rendering and public code release, underscoring the practical impact for 3D content creation from blurry monocular videos.

Abstract

This paper presents a unified framework that allows high-quality dynamic Gaussian Splatting from both defocused and motion-blurred monocular videos. Due to the significant difference between the formation processes of defocus blur and motion blur, existing methods are tailored for either one of them, lacking the ability to simultaneously deal with both of them. Although the two can be jointly modeled as blur kernel-based convolution, the inherent difficulty in estimating accurate blur kernels greatly limits the progress in this direction. In this work, we go a step further towards this direction. Particularly, we propose to estimate per-pixel reliable blur kernels using a blur prediction network that exploits blur-related scene and camera information and is subject to a blur-aware sparsity constraint. Besides, we introduce a dynamic Gaussian densification strategy to mitigate the lack of Gaussians for incomplete regions, and boost the performance of novel view synthesis by incorporating unseen view information to constrain scene optimization. Extensive experiments show that our method outperforms the state-of-the-art methods in generating photorealistic novel view synthesis from defocused and motion-blurred monocular videos. Our code is available at https://github.com/hhhddddddd/dydeblur.

Paper Structure

This paper contains 20 sections, 13 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: Performance of our method. Our method allows to synthesize high-quality sharp novel views for videos with defocus blur (top) and motion blur (bottom). As shown on the right, our method not only obtains significantly better results than existing methods, e.g., D3DGS 3dgs, SoM som, D2RF d2rf, DyBluRF dyblurf, and De4DGS deblur4dgs, but also achieves a performance of 65.143 FPS at a resolution of $512 \times 288$ on an NVIDIA RTX 3090 GPU.
  • Figure 2: Overview of our method. We initialize static Gaussians via depth reprojection and dynamic Gaussians from tracking points, modeling their transformations with learnable motion bases. After stable training, we densify dynamic Gaussians using foreground remapping. For blur modeling, a network predicts per-pixel blur kernels and intensity, enabling blur synthesis through convolution and blending operation. The reconstruction loss between synthesized and input blurry images optimizes the Gaussians for sharper results.
  • Figure 3: Visual comparison of novel view synthesis on the D2RF defocus blur dataset d2rf.
  • Figure 4: Visual comparison of novel view synthesis on the D2RF defocus blur dataset d2rf. Here, we also compare with methods fed with deblurred images produced by a state-of-the-art video deblurring method bsstnet to manifest the effectiveness of our method.
  • Figure 5: Visual comparison of novel view synthesis on the DyBluRF motion blur dataset dyblurf.
  • ...and 10 more figures