Table of Contents
Fetching ...

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

David Stotko, Nils Wandel, Reinhard Klein

TL;DR

This work tackles monocular cloth reconstruction by enforcing physical plausibility through a neural surrogate cloth model and differentiable rendering within a Shape-from-Template framework. The cloth dynamics are governed by $M \vec{a} = \vec{F}_{int} + \vec{F}_{ext}$ with $\vec{F}_{int}$ derived from an energy $E_{int}=E_Y+E_S+E_B$, and are learned via a CNN-based surrogate that advances the state with $\vec{a}_{n+1}$. The pipeline jointly optimizes the cloth shape, material parameters $(Y,S,B)$, external forces, and texture coordinates by backpropagating pixel-level losses, achieving a 400–500× speedup over prior physics-based SfT while maintaining comparable accuracy. The method demonstrates stability and strong qualitative reconstructions on challenging monocular videos, with clear ablations and identified limitations such as fine-wrinkle representation and UV-mapping artifacts. This enables practical, fast, physics-guided SfT for dynamic cloth from a single camera, with potential for broader real-time or near-real-time applications.

Abstract

3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $φ$-SfT, a state-of-the-art physics-based SfT approach.

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

TL;DR

This work tackles monocular cloth reconstruction by enforcing physical plausibility through a neural surrogate cloth model and differentiable rendering within a Shape-from-Template framework. The cloth dynamics are governed by with derived from an energy , and are learned via a CNN-based surrogate that advances the state with . The pipeline jointly optimizes the cloth shape, material parameters , external forces, and texture coordinates by backpropagating pixel-level losses, achieving a 400–500× speedup over prior physics-based SfT while maintaining comparable accuracy. The method demonstrates stability and strong qualitative reconstructions on challenging monocular videos, with clear ablations and identified limitations such as fine-wrinkle representation and UV-mapping artifacts. This enables practical, fast, physics-guided SfT for dynamic cloth from a single camera, with potential for broader real-time or near-real-time applications.

Abstract

3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to -SfT, a state-of-the-art physics-based SfT approach.
Paper Structure (31 sections, 17 equations, 10 figures, 3 tables)

This paper contains 31 sections, 17 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Behavior of stretchable objects like cloth when two anchor points are pulled apart from each other. Neither distances nor angles must remain constant under these deformations.
  • Figure 2: Overview of the optimization loop. A given initial mesh is physically simulated for several time steps by a neural network using physical parameters for stretching $Y$, shearing $S$, bending $B$ and external forces $\vec{F}_\mathrm{ext}$. The resulting meshes are converted into RGB images and masks by a differentiable renderer together with the known camera intrinsics, texture and an optimizable $uv$-map. In the end, the renderings are compared to the target video sequence by computing pixel-wise loss functions. Gradients of these losses with respect to the optimizable parameters lead to a successively refined physical simulation and reconstruction.
  • Figure 3: Architecture of the neural cloth model. Detailed explanations are provided in Section \ref{['sec:network_architecture']}.
  • Figure 4: Angles for shearing and bending energies. Straight and corner connections are treated independently to separate in-plane and out-of-plane deformations.
  • Figure 5: Training cycle of our neural cloth model. Details are provided in Section \ref{['sec:training_cycle']}.
  • ...and 5 more figures