Table of Contents
Fetching ...

PICA: Physics-Integrated Clothed Avatar

Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, Juyong Zhang

TL;DR

PICA tackles the challenge of realistically animating clothed humans with loose garments by separating body and clothing into a double-layer 3D Gaussian Splatting representation and coupling it with a physics-based driving module. The method combines mesh-aligned Gaussians anchored to body and clothing templates with a GNN-based hierarchical dynamics simulator to produce physically plausible garment motion while enabling high-fidelity novel-view synthesis. Key contributions include the double-layer representation, pose-aware appearance, segmentation and geometry losses to enforce garment separation, and a HOOD-inspired physics prior that generalizes to new garments and poses. The approach supports virtual try-on and achieves efficient inference, offering improved fidelity and dynamics over prior single-layer 3DGS avatars in complex driving poses.

Abstract

We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.

PICA: Physics-Integrated Clothed Avatar

TL;DR

PICA tackles the challenge of realistically animating clothed humans with loose garments by separating body and clothing into a double-layer 3D Gaussian Splatting representation and coupling it with a physics-based driving module. The method combines mesh-aligned Gaussians anchored to body and clothing templates with a GNN-based hierarchical dynamics simulator to produce physically plausible garment motion while enabling high-fidelity novel-view synthesis. Key contributions include the double-layer representation, pose-aware appearance, segmentation and geometry losses to enforce garment separation, and a HOOD-inspired physics prior that generalizes to new garments and poses. The approach supports virtual try-on and achieves efficient inference, offering improved fidelity and dynamics over prior single-layer 3DGS avatars in complex driving poses.

Abstract

We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.
Paper Structure (29 sections, 18 equations, 7 figures, 2 tables)

This paper contains 29 sections, 18 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Given multi-view RGB inputs (25 views in this example), PICA reconstructs the clothed avatar as a double-layer representation and applies novel pose animation and virtual try-on with a physics-integrated driving module.
  • Figure 2: Overview. PICA represents clothed human avatars as two separate template meshes and corresponding mesh-aligned Gaussians. The avatar in canonical space is first deformed to observed space by non-rigid deformation and LBS, and then rasterized to the image space of the given camera with a pose-dependent color MLP. After reconstructing the avatar with appearance loss and geometry loss, PICA utilizes a hierarchical graph-based neural dynamics simulator to generate the simulation geometry sequence, which is rendered to the final animation result according to the trained appearance model.
  • Figure 3: Qualitative results of novel view synthesis (left) on training frames and novel pose animation (right).
  • Figure 4: Ablation study on the number of training views. (a) Novel view synthesis results on the training frame. (b) Novel pose animation results on the training view.
  • Figure 5: (a) Ablation study on the pose-dependent color. Compared to SH, our appearance model can better model self-shadows and wrinkles in garments. (b) Ablation study on the two-layer representation. The single-layer representation struggles to model the loose or sliding dynamics of clothing.
  • ...and 2 more figures