Table of Contents
Fetching ...

BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa

TL;DR

This work tackles the fragility of 3D Gaussian Splatting under real-world blur by introducing Blur Agnostic Gaussian Splatting (BAGS). BAGS adds a Blur Proposal Network to estimate per-pixel blur kernels and a per-pixel mask, and employs a coarse-to-fine optimization across scales to stabilize joint 3D scene optimization with 2D degradation modeling. The method leverages RGBD-aware, multi-modal features to disentangle blur from geometry, achieving state-of-the-art photorealistic renderings across camera motion, defocus, and mixed-resolution scenarios, including unbounded 360 drone data. The approach yields interpretable blur kernels and region masks, enabling both robust reconstruction in degraded imagery and practical analysis of blur patterns.

Abstract

Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches.

BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

TL;DR

This work tackles the fragility of 3D Gaussian Splatting under real-world blur by introducing Blur Agnostic Gaussian Splatting (BAGS). BAGS adds a Blur Proposal Network to estimate per-pixel blur kernels and a per-pixel mask, and employs a coarse-to-fine optimization across scales to stabilize joint 3D scene optimization with 2D degradation modeling. The method leverages RGBD-aware, multi-modal features to disentangle blur from geometry, achieving state-of-the-art photorealistic renderings across camera motion, defocus, and mixed-resolution scenarios, including unbounded 360 drone data. The approach yields interpretable blur kernels and region masks, enabling both robust reconstruction in degraded imagery and practical analysis of blur patterns.

Abstract

Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches.
Paper Structure (9 sections, 8 equations, 7 figures, 2 tables)

This paper contains 9 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We introduce BAGS, which can reconstruct high quality scenes even from blurry training images. Moreover, BAGS can provide kernels and masks that indicate the types and regions of the blur, as shown by the highlighted regions under Mask.
  • Figure 2: BAGS is optimized by training a Blur Proposal Network on top of the scene $\mathcal{G}(\textbf{v})$ over multiple scales. Top: We extract $f^s_\textrm{RGBD}$ from color and depth, which are concatenated with the position and view embedding $p(x), l(i)$ to form the Multi-Modal Feature (MMF). The kernel MLP then estimates the per-pixel kernel $h^s$ and mask $m^s$. We use $h^s$ to model the blur image $\tilde{C}^s$ and employ $m^s$ to blend the rendered image $C^s$ and the blur-modeled image $\tilde{C}^s$, yielding ${C}_{\textrm{out}}^s$. Bottom: After $N_s$ steps, we upscale image resolution and modify the kernel MLP to produce $h^{s-1}$ with a larger kernel size.
  • Figure 3: Gaussians may get stuck at a local minimum without proper initialization. As indicated in \ref{['wo_c2f']}, optimizing $h$ directly can lead to noisy surfaces. Naively densifying the scene before adding $h$ can also lead to noisy Gaussians; as shown in \ref{['naive']}, the noisy Gasussians are not well removed even after adding $h$. By using a coarse-to-fine training schedule, we achieve better results in \ref{['rob-gs-a']} and \ref{['rob-gs-b']}.
  • Figure 4: Ablation study of BAGS with different sub-modules. In \ref{['img:rgbd_deform']} and \ref{['img:defocus']}, introducing RGBD features and coarse-to-fine optimization improve novel view synthesis quality, and alternative sparse deformable kernel suffers in performance; \ref{['img:speed']} demonstrates our speed improvement against $\textrm{BAGS}^{\textrm{noC2F}}$. In \ref{['img:kernel_size']}, larger kernel generally leads to better performance, other than in the mix resolution scenario.
  • Figure 5: Visualizations of test views on camera motion and defocus blur dataset. Mip-Sp and Db-NeRF are short for Mip-Splatting mipsplatting and Deblur-NeRF ma2022deblur.
  • ...and 2 more figures