Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Adrian Azzarelli; Nantheera Anantrasirichai; David R Bull

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

TL;DR

Sparse camera configurations in filmmaking hamper robust dynamic 3-D reconstruction, especially for reflective, transparent, and dynamically textured content. The paper introduces a foreground-background disentangled dynamic Gaussian Splatting framework that splits canonical Gaussians $G_f$ and $G_b$ using a sparse mask at $t=0$, learns separate hex-plane deformation fields $\Lambda_f$ and $\Lambda_b$, and employs a modified opacity model to capture dynamic textures while using a reference-free densification strategy. Key contributions include mask-based canonical initialization, dual deformation fields aligned with filmmaking practices (background only displacement, foreground full motion and color changes), an opacity-based mechanism for RTD textures, and a densification scheme that reduces background bias and preserves foreground fidelity. Experiments on sparse-view 3-D and 2.5-D entertainment datasets show SotA qualitative and quantitative gains, up to $>3$ PSNR with about half the model size on 3-D scenes, and enable clean foreground segmentation including transparent textures for post-production workflows.

Abstract

Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://interims-git.github.io/

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

TL;DR

Abstract

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)