PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng; Qingqing Zhao; Guandao Yang; Wang Yifan; Donglai Xiang; Florian Dubost; Dmitry Lagun; Thabo Beeler; Federico Tombari; Leonidas Guibas; Gordon Wetzstein

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

TL;DR

PhysAvatar tackles the challenge of reconstructing realistic clothed 3D avatars from multi-view video by bridging inverse rendering with inverse physics. It introduces a three-part pipeline: mesh tracking using mesh-aligned 4D Gaussians, physics-based garment parameter estimation with the C-IPC simulator and finite-difference gradients, and appearance refinement via a physically based differentiable renderer (Mitsuba3). The method yields accurate garment geometry and compelling appearance under novel motions and lighting, outperforming state-of-the-art baselines in geometry and achieving competitive appearance metrics. This framework enables realistic novel-view rendering, relighting, and redressing in a standard CG workflow, marking a significant step toward physically grounded digital humans.

Abstract

Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along with the physical parameters of the fabric of their clothes. For this purpose, we adopt a mesh-aligned 4D Gaussian technique for spatio-temporal mesh tracking as well as a physically based inverse renderer to estimate the intrinsic material properties. PhysAvatar integrates a physics simulator to estimate the physical parameters of the garments using gradient-based optimization in a principled manner. These novel capabilities enable PhysAvatar to create high-quality novel-view renderings of avatars dressed in loose-fitting clothes under motions and lighting conditions not seen in the training data. This marks a significant advancement towards modeling photorealistic digital humans using physically based inverse rendering with physics in the loop. Our project website is at: https://qingqing-zhao.github.io/PhysAvatar

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

TL;DR

Abstract

Paper Structure (31 sections, 10 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 31 sections, 10 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Scene Reconstruction from Visual Observations
Animatable Avatars
Physics-Based Simulation
Method
Mesh Tracking
Physics Based Dynamic Modeling
Physics Based Appearance Modeling
Experiments
Experimental setup
Comparison
Ablation
Application
Limitations and Future Work
...and 16 more sections

Figures (9)

Figure 2: Method Overview: (a) PhysAvatar takes multi-view videos and an initial mesh as input. We first perform (b) dynamic mesh tracking (Sec. \ref{['sec:method-mesh']}). The tacked mesh sequences are then used for (c) garment physics estimation with a physics simulator combined with gradient-based optimization (Sec. \ref{['sec:method-phy']}); (d) and appearance estimation through physics-based differentiable rendering (Sec. \ref{['sec:method-render']}). At test time, (e) given a sequence of body poses (f), we simulate garment dynamics with the learned physics parameters and employ physics-based rendering to produce the final images.
Figure 3: Our method can robustly track a dynamic mesh from input images, providing accurate long-term correspondences. Here we show the rendered images overlaid with the Gaussian trajectories from the previous 12 frames and the optimized meshes.
Figure 4: Ablation study on appearance estimation: (a) Initial texture map ${\mathbf{T}}$ extracted from Gaussian splatting (left) has baked-in shadows highlighted in the red boxes; post-optimization (right), the baked-in shadows are substantially removed. (b) Rendering comparisons demonstrate that our method with the optimized texture map more closely aligns with the ground truth.
Figure 5: Qualitative results on test poses from the ActorHQ jiang2023hifi4g dataset. Our method PhysAvatar achieves state-of-the-art performance in terms of geometry detail and appearance modeling.
Figure 6: Here we show animation results of current state-of-the-art methods including ARAH ARAH:2022:ECCV, TAVA li2022tava and GS-Avatar hu2023gaussianavatar, and our results on test motions from AMASS mahmood2019amass dataset. Images are rendered from novel views. Please zoom in for details.
...and 4 more figures

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

TL;DR

Abstract

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Authors

TL;DR

Abstract

Table of Contents

Figures (9)