Table of Contents
Fetching ...

Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields

Navami Kairanda, Marc Habermann, Shanthika Naik, Christian Theobalt, Vladislav Golyanik

TL;DR

This work tackles monocular non-rigid 3D surface tracking of highly deformable objects (e.g., cloth). It introduces Thin-Shell-SfT, which replaces discrete meshes with a continuous adaptive neural surface and couples a Kirchhoff-Love thin-shell prior with differentiable 3D Gaussian Splatting to enforce photometric consistency via analysis-by-synthesis. A dual-network deformation representation is learned: a Neural Reference Field $\bar{x}(\xi;\Upsilon)$ for the template and a Neural Deformation Field $u(\xi,t;\Theta)$ for time-varying motion, enabling high-frequency wrinkle capture. On the $\boldsymbol{\phi}$-SfT dataset, the method outperforms prior SfT/NRSfM and physics-based approaches in geometry accuracy and temporal coherence, while remaining computationally feasible for monocular video sequences.

Abstract

3D reconstruction of highly deformable surfaces (e.g. cloths) from monocular RGB videos is a challenging problem, and no solution provides a consistent and accurate recovery of fine-grained surface details. To account for the ill-posed nature of the setting, existing methods use deformation models with statistical, neural, or physical priors. They also predominantly rely on nonadaptive discrete surface representations (e.g. polygonal meshes), perform frame-by-frame optimisation leading to error propagation, and suffer from poor gradients of the mesh-based differentiable renderers. Consequently, fine surface details such as cloth wrinkles are often not recovered with the desired accuracy. In response to these limitations, we propose ThinShell-SfT, a new method for non-rigid 3D tracking that represents a surface as an implicit and continuous spatiotemporal neural field. We incorporate continuous thin shell physics prior based on the Kirchhoff-Love model for spatial regularisation, which starkly contrasts the discretised alternatives of earlier works. Lastly, we leverage 3D Gaussian splatting to differentiably render the surface into image space and optimise the deformations based on analysis-bysynthesis principles. Our Thin-Shell-SfT outperforms prior works qualitatively and quantitatively thanks to our continuous surface formulation in conjunction with a specially tailored simulation prior and surface-induced 3D Gaussians. See our project page at https://4dqv.mpiinf.mpg.de/ThinShellSfT.

Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields

TL;DR

This work tackles monocular non-rigid 3D surface tracking of highly deformable objects (e.g., cloth). It introduces Thin-Shell-SfT, which replaces discrete meshes with a continuous adaptive neural surface and couples a Kirchhoff-Love thin-shell prior with differentiable 3D Gaussian Splatting to enforce photometric consistency via analysis-by-synthesis. A dual-network deformation representation is learned: a Neural Reference Field for the template and a Neural Deformation Field for time-varying motion, enabling high-frequency wrinkle capture. On the -SfT dataset, the method outperforms prior SfT/NRSfM and physics-based approaches in geometry accuracy and temporal coherence, while remaining computationally feasible for monocular video sequences.

Abstract

3D reconstruction of highly deformable surfaces (e.g. cloths) from monocular RGB videos is a challenging problem, and no solution provides a consistent and accurate recovery of fine-grained surface details. To account for the ill-posed nature of the setting, existing methods use deformation models with statistical, neural, or physical priors. They also predominantly rely on nonadaptive discrete surface representations (e.g. polygonal meshes), perform frame-by-frame optimisation leading to error propagation, and suffer from poor gradients of the mesh-based differentiable renderers. Consequently, fine surface details such as cloth wrinkles are often not recovered with the desired accuracy. In response to these limitations, we propose ThinShell-SfT, a new method for non-rigid 3D tracking that represents a surface as an implicit and continuous spatiotemporal neural field. We incorporate continuous thin shell physics prior based on the Kirchhoff-Love model for spatial regularisation, which starkly contrasts the discretised alternatives of earlier works. Lastly, we leverage 3D Gaussian splatting to differentiably render the surface into image space and optimise the deformations based on analysis-bysynthesis principles. Our Thin-Shell-SfT outperforms prior works qualitatively and quantitatively thanks to our continuous surface formulation in conjunction with a specially tailored simulation prior and surface-induced 3D Gaussians. See our project page at https://4dqv.mpiinf.mpg.de/ThinShellSfT.

Paper Structure

This paper contains 25 sections, 17 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Our Thin-Shell-SfT approach reconstructs high-fidelity deformable 3D surface geometry with fine-grained wrinkles from a monocular video, while the previous best method kairanda2022f struggles. The coloured tracks ("Ours, Input View") visualise the 3D Gaussian trajectories over time and across all input video frames.
  • Figure 2: Overview of Thin-Shell-SfT. Our deformation model encodes the surface and its dynamics as neural fields. Given the template $\mathbf{S}_1$, we first fit a reference field (NRF) from 2D parametric points $\boldsymbol{\xi}$ to the initial 3D positions $\mathbf{\bar{x}}$. In the main stage, we optimise the deformation field (NDF) $\mathbf{u}(\boldsymbol{\xi},t)$ by relating estimated surface states $\mathbf{S}_t/\mathcal{G}_t$ to the input monocular views. We induce the dynamically tracked Gaussians to the surface by: (1) Computing their positions $\mathbf{x}$ as the sum of the initial position $\mathbf{\bar{x}}$ and NDF output $\mathbf{u}$, (2) Setting their rotations $\mathbf{\bar{a}}_i$ as the template's local coordinate system, and (3) Fixing the normal scale $\epsilon$, and optimising the colour, opacity and tangential scales $(s_1,s_2)$ using only the template texture. For physical plausibility, we impose continuous Kirchhoff-Love physics constraints.
  • Figure 3: Thin-shell physics prior is a spatial regulariser that minimises the hyperelastic strain energy density due to deformation w.r.t. the known template.
  • Figure 4: Comparison of the reconstruction normal maps for ours and $\boldsymbol{\phi}$-SfT. We show the cosine normal consistency to ground truth on the right and the normal metrics in \ref{['tab:runtime_psnr_normal_p2s']} (Appendix).
  • Figure 5: Examplary 3D reconstructions. (Left:) Comparisons focusing on high-frequency wrinkles. Thin-Shell-SfT captures the wrinkles best among all compared methods in one of the most challenging examples, outperforming $\phi$-SfT kairanda2022f, Stotko et al.stotko2023physics and Diff-NRSfM Parashar_2020_CVPR. (Right:) Our results on the extended $\boldsymbol{\phi}$-SfT dataset highlight the excellent tracking in the occluded regions.
  • ...and 7 more figures