Table of Contents
Fetching ...

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal

TL;DR

SimAvatar addresses the challenge of generating fully simulation-ready 3D avatars from text prompts by decomposing the avatar into layered body, garment, and hair representations, each coupled with 3D Gaussians for high-fidelity appearance. It combines text-conditioned diffusion models for garment and hair with SMPL-based body geometry, and uses SDS-based optimization to learn realistic textures that animate under physics-based simulations (HOOD for garments and strand/hair simulators). The framework enables pose-driven dynamics with realistic wrinkles and flowing hair, transferring motion from simulators to the Gaussian appearance fields and applying a Phong shading model for plausible rendering. Quantitative results on VQAScore and user studies demonstrate improved alignment with prompts and user preference over state-of-the-art baselines, making it suitable for virtual try-on, film, gaming, and AR/VR applications.

Abstract

We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

TL;DR

SimAvatar addresses the challenge of generating fully simulation-ready 3D avatars from text prompts by decomposing the avatar into layered body, garment, and hair representations, each coupled with 3D Gaussians for high-fidelity appearance. It combines text-conditioned diffusion models for garment and hair with SMPL-based body geometry, and uses SDS-based optimization to learn realistic textures that animate under physics-based simulations (HOOD for garments and strand/hair simulators). The framework enables pose-driven dynamics with realistic wrinkles and flowing hair, transferring motion from simulators to the Gaussian appearance fields and applying a Phong shading model for plausible rendering. Quantitative results on VQAScore and user studies demonstrate improved alignment with prompts and user preference over state-of-the-art baselines, making it suitable for virtual try-on, film, gaming, and AR/VR applications.

Abstract

We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.

Paper Structure

This paper contains 36 sections, 5 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (a) SimAvatar synthesizes simulation-ready 3D avatars with layered hair, body and clothing. (b) By leveraging physics-based simulators, SimAvatar produces realistic pose-dependent motion effects such as flowing hair strands and garment wrinkles. (c) SimAvatar produces simulation-ready 3D avatars with diverse identities, garments, and hairstyles.
  • Figure 2: Overview. Given a text prompt, SimAvatar first generates hair strands, body mesh, and garment using their respective text-conditioned generative models. We then bring them together using physics simulation, and attach 3D Gaussians to learn their appearances and to adapt them according to the text prompts. We assign one 3D Gaussian to each line segment in the hair strands with their length significantly larger than their diameter (green circle), whereas the Gaussians for meshes are defined within a local coordinate system of each face (orange circle). We optimize the properties of all 3D Gaussians through the Score Distillation Sampling (SDS) loss using image-based diffusion models, and a novel regularization $L_{hair}$ for hairs to ensure plausible hair structure. The generated 3D avatar can be simulated by any pose sequence showing realistic and dynamic motion effects such as flowing hairs and garment wrinkles.
  • Figure 3: Text-based garment diffusion model. See Sec. \ref{['sec:diffusion']} for details.
  • Figure 4: Qualitative comparison with the state-of-the-art methods. See the supplementary video for more comparisons.
  • Figure 5: Qualitative comparison of animated avatars. See the supplementary video for more animation comparisons.
  • ...and 5 more figures