Table of Contents
Fetching ...

CoGS: Controllable Gaussian Splatting

Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni

TL;DR

The paper tackles dynamic 3D scene capture from monocular input and enables explicit, controllable manipulation of dynamic scenes. It introduces CoGS, a framework that combines Dynamic Gaussian Splatting with Controllable Gaussian Splatting, using a differentiable Gaussian rasterizer, per-parameter deformation networks, learnable 3D masks, and unsupervised control-signal extraction. Through static preconditioning with SfM, deformation regularization losses, and a four-step controllability pipeline, CoGS achieves state-of-the-art fidelity on synthetic and real dynamic scenes while enabling manipulation without pre-computed controls. The work paves the way for real-time, editable 3D content creation on commodity hardware, while noting limitations with complex lighting, large non-rigid motions, and boundary artifacts at control-region edges.

Abstract

Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity.

CoGS: Controllable Gaussian Splatting

TL;DR

The paper tackles dynamic 3D scene capture from monocular input and enables explicit, controllable manipulation of dynamic scenes. It introduces CoGS, a framework that combines Dynamic Gaussian Splatting with Controllable Gaussian Splatting, using a differentiable Gaussian rasterizer, per-parameter deformation networks, learnable 3D masks, and unsupervised control-signal extraction. Through static preconditioning with SfM, deformation regularization losses, and a four-step controllability pipeline, CoGS achieves state-of-the-art fidelity on synthetic and real dynamic scenes while enabling manipulation without pre-computed controls. The work paves the way for real-time, editable 3D content creation on commodity hardware, while noting limitations with complex lighting, large non-rigid motions, and boundary artifacts at control-region edges.

Abstract

Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity.
Paper Structure (21 sections, 17 equations, 12 figures, 4 tables)

This paper contains 21 sections, 17 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: From a set of monocular images capturing a moving scene, a dynamic 3D representation is learned using time-varying Gaussians (a). Then the articulated parts (depicted with the trajectories of the motion) are identified using masking (b). This allows for learning a fine-scale, per-Gaussian level of control (c). The approach is capable of synthesizing novel configurations not present in the original sequence, for example, independently opening the hood, trunk, and doors of the toy car.
  • Figure 2: CoGS Overview. CoGS consists of two parts: Dynamic GS and Controllable GS. For Dynamic GS, an offset is learned for $(\mu, C, R, S)$ by separate MLPs (only one shown in figure). To extend to controllable scenarios, signals extracted from the dynamic model are used to obtain attribute offsets, which are then masked to affect the desired control region.
  • Figure 3: Lego synthetic scene visualized as a pointcloud of colored Gaussian centers. The smaller and fewer colored lines indicate less change in position over time. Adding $\mathcal{L^{\text{norm}}}$ stabilizes the static Gaussian's positions. (a) Without $\mathcal{L^{\text{norm}}}$. (b) With $\mathcal{L^{\text{norm}}}$.
  • Figure 4: Jumping Jack synthetic scene visualized as a pointcloud of colored Gaussian centers. The smaller and fewer colored lines indicate less change in position over time. Adding $\mathcal{L^{\text{diff}}}$ stabilizes the 3D Gaussian's trajectories. (a) Without $\mathcal{L^{\text{diff}}}$. (b) With $\mathcal{L^{\text{diff}}}$.
  • Figure 5: Control blue ball and green ball separately.
  • ...and 7 more figures