Table of Contents
Fetching ...

DRoPS: Dynamic 3D Reconstruction of Pre-Scanned Objects

Narek Tumanyan, Samuel Rota Bulò, Denis Rozumny, Lorenzo Porzi, Adam Harley, Tali Dekel, Peter Kontschieder, Jonathon Luiten

Abstract

Dynamic scene reconstruction from casual videos has seen recent remarkable progress. Numerous approaches have attempted to overcome the ill-posedness of the task by distilling priors from 2D foundational models and by imposing hand-crafted regularization on the optimized motion. However, these methods struggle to reconstruct scenes from extreme novel viewpoints, especially when highly articulated motions are present. In this paper, we present DRoPS, a novel approach that leverages a static pre-scan of the dynamic object as an explicit geometric and appearance prior. While existing state-of-the-art methods fail to fully exploit the pre-scan, DRoPS leverages our novel setup to effectively constrain the solution space and ensure geometrical consistency throughout the sequence. The core of our novelty is twofold: first, we establish a grid-structured and surface-aligned model by organizing Gaussian primitives into pixel grids anchored to the object surface. Second, by leveraging the grid structure of our primitives, we parameterize motion using a CNN conditioned on those grids, injecting strong implicit regularization and correlating the motion of nearby points. Extensive experiments demonstrate that our method significantly outperforms the current state of the art in rendering quality and 3D tracking accuracy.

DRoPS: Dynamic 3D Reconstruction of Pre-Scanned Objects

Abstract

Dynamic scene reconstruction from casual videos has seen recent remarkable progress. Numerous approaches have attempted to overcome the ill-posedness of the task by distilling priors from 2D foundational models and by imposing hand-crafted regularization on the optimized motion. However, these methods struggle to reconstruct scenes from extreme novel viewpoints, especially when highly articulated motions are present. In this paper, we present DRoPS, a novel approach that leverages a static pre-scan of the dynamic object as an explicit geometric and appearance prior. While existing state-of-the-art methods fail to fully exploit the pre-scan, DRoPS leverages our novel setup to effectively constrain the solution space and ensure geometrical consistency throughout the sequence. The core of our novelty is twofold: first, we establish a grid-structured and surface-aligned model by organizing Gaussian primitives into pixel grids anchored to the object surface. Second, by leveraging the grid structure of our primitives, we parameterize motion using a CNN conditioned on those grids, injecting strong implicit regularization and correlating the motion of nearby points. Extensive experiments demonstrate that our method significantly outperforms the current state of the art in rendering quality and 3D tracking accuracy.

Paper Structure

This paper contains 24 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Given a static pre-scan and a monocular video of a dynamic object, DRoPS reconstructs a complete dynamic 3D representation, enabling high-quality novel view synthesis. The pre-scan is either captured (left) or is generated from the first monocular video frame (right). The figure shows the dynamic objects rendered at multiple timesteps from a fixed novel view. The colored lines visualize the 3D point trajectories that emerge from our dynamic model.
  • Figure 2: DRoPS Overview. (a) We organize our canonical Gaussians into structured pixel grids that reside on virtual cameras surrounding the object, each pixel encoding the parameters of its back-projected 3D Gaussian (\ref{['main:sec:structured-gauss']}). (b) To reconstruct the dynamic sequence, we model the object deformation with Deep Motion Prior $\Phi_{\theta}$ -- a CNN that maps canonical positions $\bm{\mu}_j$ and timestep encodings $\gamma(t)$ to 6-DOF motion parameters $[\mathbf{Q}_j, \bm{\Delta}_j]$ (see \ref{['main:sec:dmp']}). Timesteps ${\color{t1color}t_1}, {\color{t2color}t_2}, {\color{t3color}t_3}$ are color-coded in the figure.
  • Figure 3: Surface-aligned Gaussians. We visualize canonical Gaussians of the upper body. Unlike standard 3DGS (left), where Gaussians are unordered and lack surface representation, our structured Gaussians align with the object's surface, providing a more robust and generalizable representation for dynamic-time reconstruction.
  • Figure 4: Qualitative results. The first column depicts the training view from the monocular sequence; the second column depicts the ground-truth testing view at the same timestep. Novel views are selected at extreme angles to evaluate the completeness of dynamic 3D reconstructions. Our method drastically outperforms the baselines in maintaining a consistent object geometry and sharp appearance, while accurately modeling the scene dynamics. See our website for video results.
  • Figure 5: DRoPS achieves high-quality dynamic 3D reconstruction on in-the-wild monocular videos by generating the pre-scan with an image-to-3D model xiang2025trellis2. Our novel views are rendered from viewpoints that differ significantly from those in the training set. See our website for full video and additional results.
  • ...and 4 more figures