CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni
TL;DR
The paper tackles dynamic 3D scene capture from monocular input and enables explicit, controllable manipulation of dynamic scenes. It introduces CoGS, a framework that combines Dynamic Gaussian Splatting with Controllable Gaussian Splatting, using a differentiable Gaussian rasterizer, per-parameter deformation networks, learnable 3D masks, and unsupervised control-signal extraction. Through static preconditioning with SfM, deformation regularization losses, and a four-step controllability pipeline, CoGS achieves state-of-the-art fidelity on synthetic and real dynamic scenes while enabling manipulation without pre-computed controls. The work paves the way for real-time, editable 3D content creation on commodity hardware, while noting limitations with complex lighting, large non-rigid motions, and boundary artifacts at control-region edges.
Abstract
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity.
