4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction
Kirill Mazur, Marwan Taher, Andrew J. Davison
TL;DR
4D Primitive-Mâché addresses persistent 4D reconstruction from casual monocular video by decomposing scenes into moving rigid primitives and gluing their trajectories over time via dense 2D-3D correspondences. The method introduces a primitive-based motion parameterization using SE(3) poses, a front-end for geometry, segmentation, and correspondences, and a back-end Gauss-Newton optimization with time remapping to produce complete 4D reconstructions. It demonstrates superior accuracy and completeness on object-scanning and multi-object datasets, and showcases object permanence by inferring motion of occluded elements. This approach reduces dynamic reconstruction dimensionality while enabling replayable, temporally-consistent 4D scenes suitable for robotics and AR applications.
Abstract
We present a dynamic reconstruction system that receives a casual monocular RGB video as input, and outputs a complete and persistent reconstruction of the scene. In other words, we reconstruct not only the the currently visible parts of the scene, but also all previously viewed parts, which enables replaying the complete reconstruction across all timesteps. Our method decomposes the scene into a set of rigid 3D primitives, which are assumed to be moving throughout the scene. Using estimated dense 2D correspondences, we jointly infer the rigid motion of these primitives through an optimisation pipeline, yielding a 4D reconstruction of the scene, i.e. providing 3D geometry dynamically moving through time. To achieve this, we also introduce a mechanism to extrapolate motion for objects that become invisible, employing motion-grouping techniques to maintain continuity. The resulting system enables 4D spatio-temporal awareness, offering capabilities such as replayable 3D reconstructions of articulated objects through time, multi-object scanning, and object permanence. On object scanning and multi-object datasets, our system significantly outperforms existing methods both quantitatively and qualitatively.
