SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal
TL;DR
SimAvatar addresses the challenge of generating fully simulation-ready 3D avatars from text prompts by decomposing the avatar into layered body, garment, and hair representations, each coupled with 3D Gaussians for high-fidelity appearance. It combines text-conditioned diffusion models for garment and hair with SMPL-based body geometry, and uses SDS-based optimization to learn realistic textures that animate under physics-based simulations (HOOD for garments and strand/hair simulators). The framework enables pose-driven dynamics with realistic wrinkles and flowing hair, transferring motion from simulators to the Gaussian appearance fields and applying a Phong shading model for plausible rendering. Quantitative results on VQAScore and user studies demonstrate improved alignment with prompts and user preference over state-of-the-art baselines, making it suitable for virtual try-on, film, gaming, and AR/VR applications.
Abstract
We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.
