Table of Contents
Fetching ...

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen

TL;DR

This work tackles monocular non-rigid object reconstruction by introducing Neural Parametric Gaussians (NPGs), a two-stage framework that first learns a temporally coherent coarse deformation model and then optimizes 3D Gaussians within local volumes guided by that template. The coarse stage provides strong regularization and correspondences across time, while the Gaussian-based stage captures fine-scale geometry and appearance, enabling high-quality radiance-field renderings for dynamic objects. Across synthetic and real monocular datasets with sparse multi-view cues, NPGs achieve state-of-the-art or competitive results and offer fast rendering thanks to a per-scene Gaussian splatting approach. The method demonstrates robust performance without heavy priors, highlighting the power of neural parametric regularization for high-fidelity novel-view synthesis in challenging dynamic scenarios.

Abstract

Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

TL;DR

This work tackles monocular non-rigid object reconstruction by introducing Neural Parametric Gaussians (NPGs), a two-stage framework that first learns a temporally coherent coarse deformation model and then optimizes 3D Gaussians within local volumes guided by that template. The coarse stage provides strong regularization and correspondences across time, while the Gaussian-based stage captures fine-scale geometry and appearance, enabling high-quality radiance-field renderings for dynamic objects. Across synthetic and real monocular datasets with sparse multi-view cues, NPGs achieve state-of-the-art or competitive results and offer fast rendering thanks to a per-scene Gaussian splatting approach. The method demonstrates robust performance without heavy priors, highlighting the power of neural parametric regularization for high-fidelity novel-view synthesis in challenging dynamic scenarios.

Abstract

Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.
Paper Structure (49 sections, 8 equations, 16 figures, 8 tables)

This paper contains 49 sections, 8 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: We present Neural Parametric Gaussians (NPGs), a method for monocular non-rigid reconstruction of objects. a) Our method enables to produce high quality reconstructions in easy settings like the object-level D-NeRF scenes while b) also being able to handle challenging monocular settings much better than previous work through strong parametric low-rank regularization.
  • Figure 2: Overview of our method. We present a two-stage method. In stage 1 (left) we learn a coarse point model, which is parameterized through low-rank coefficients from an MLP. In stage 2 (right), we optimize 3D Gaussians in local volumes, defined by the point sets. The figure distinguishes between parts that are shared over time ($\blacksquare$), individual for each time step ($\blacksquare$), and fixed-function ($\blacksquare$). MLP weights $\theta$, Gaussian interpolation weights $\mathbf{w}$, scales $\mathbf{S}$, rotations $\mathbf{R}$ and harmonic coefficients $\mathbf{h}$ are shared over time and the deformation is purely modeled by the low-rank coefficients $\alpha_i$, leading to a different coarse point model for each frame.
  • Figure 3: Qualitative comparison on novel views of the D-NeRF dataset. We can see that our method produces more detailed reconstruction than previous work. Also, even with multi-view cues in D-NeRF, previous methods fail to always keep correct correspondences, as seen in the second example around the feet. In contrast, our NPGs keeps the shape coherent at all times and captures high frequency details under deformation.
  • Figure 4: Point trajectory visualization. Our coarse parametric model automatically provides point trajectories, which in turn demonstrates the quality and smoothness of our optimized templates here. Top Row: Synthetic Cactus, Real Cactus and Synthetic Human sequences from the Unbiased4D dataset. Bottom Row: Jumping Jack, Stand Up and Hook sequences from the D-NeRF dataset. Note that the human on the top right is sliding with constant speed in this sequence, which is visible in the trajectories.
  • Figure 5: Rendered depth from optimized NPGs. We render depth maps from optimized models, showing consistent geometry.
  • ...and 11 more figures