Table of Contents
Fetching ...

Barbie: Text to Barbie-Style 3D Avatars

Xiaokun Sun, Zhenyu Zhang, Ying Tai, Hao Tang, Zili Yi, Jian Yang

TL;DR

Barbie is proposed, a novel text-driven framework for generating animatable 3D avatars with separable shoes, accessories, and simulation-ready garments, truly capturing the iconic ``Barbie doll'' aesthetic.

Abstract

To integrate digital humans into everyday life, there is a strong demand for generating high-quality, fine-grained disentangled 3D avatars that support expressive animation and simulation capabilities, ideally from low-cost textual inputs. Although text-driven 3D avatar generation has made significant progress by leveraging 2D generative priors, existing methods still struggle to fulfill all these requirements simultaneously. To address this challenge, we propose Barbie, a novel text-driven framework for generating animatable 3D avatars with separable shoes, accessories, and simulation-ready garments, truly capturing the iconic ``Barbie doll'' aesthetic. The core of our framework lies in an expressive 3D representation combined with appropriate modeling constraints. Unlike previous methods, we innovatively employ G-Shell to uniformly model both watertight components (e.g., bodies, shoes, and accessories) and non-watertight garments compatible with simulation. Furthermore, we introduce a well-designed initialization and a hole regularization loss to ensure clean open surface modeling. These disentangled 3D representations are then optimized by specialized expert diffusion models tailored to each domain, ensuring high-fidelity outputs. To mitigate geometric artifacts and texture conflicts when combining different expert models, we further propose several effective geometric losses and strategies. Extensive experiments demonstrate that Barbie outperforms existing methods in both dressed human and outfit generation. Our framework further enables diverse applications, including apparel combination, editing, expressive animation, and physical simulation. Our project page is: https://xiaokunsun.github.io/Barbie.github.io

Barbie: Text to Barbie-Style 3D Avatars

TL;DR

Barbie is proposed, a novel text-driven framework for generating animatable 3D avatars with separable shoes, accessories, and simulation-ready garments, truly capturing the iconic ``Barbie doll'' aesthetic.

Abstract

To integrate digital humans into everyday life, there is a strong demand for generating high-quality, fine-grained disentangled 3D avatars that support expressive animation and simulation capabilities, ideally from low-cost textual inputs. Although text-driven 3D avatar generation has made significant progress by leveraging 2D generative priors, existing methods still struggle to fulfill all these requirements simultaneously. To address this challenge, we propose Barbie, a novel text-driven framework for generating animatable 3D avatars with separable shoes, accessories, and simulation-ready garments, truly capturing the iconic ``Barbie doll'' aesthetic. The core of our framework lies in an expressive 3D representation combined with appropriate modeling constraints. Unlike previous methods, we innovatively employ G-Shell to uniformly model both watertight components (e.g., bodies, shoes, and accessories) and non-watertight garments compatible with simulation. Furthermore, we introduce a well-designed initialization and a hole regularization loss to ensure clean open surface modeling. These disentangled 3D representations are then optimized by specialized expert diffusion models tailored to each domain, ensuring high-fidelity outputs. To mitigate geometric artifacts and texture conflicts when combining different expert models, we further propose several effective geometric losses and strategies. Extensive experiments demonstrate that Barbie outperforms existing methods in both dressed human and outfit generation. Our framework further enables diverse applications, including apparel combination, editing, expressive animation, and physical simulation. Our project page is: https://xiaokunsun.github.io/Barbie.github.io
Paper Structure (17 sections, 18 equations, 10 figures, 2 tables)

This paper contains 17 sections, 18 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Our method generates Barbie-style 3D avatars from textual input. "Barbie-style" refers to the following key characteristics: (1) High-Quality geometry and realistic appearance, ensuring visually lifelike avatars; (2) Fine-Grained Decoupling, separating body, clothing, shoes, and accessories to enable flexible apparel combination and editing; (3) Expressive Animation, supporting a wide range of body movements, facial expressions, and hand gestures; (4) Simulation Compatibility, enabling modeling of non-watertight garments and seamless integration into existing physical simulation pipelines.
  • Figure 2: The process for generating a basic human model involves two steps: (a) Employing human-specific geometry-aware diffusion models and the SMPLX-evolving prior loss to model realistic and reasonable body shapes. (b) Subsequently, using a normal-conditioned diffusion model to generate lifelike human textures.
  • Figure 3: The process for generating apparel involves three steps: (a) Initializing apparel with the semantic-aligned human body. (b) Modeling apparel piece by piece using object-specific generative priors and geometric losses. (c) Refining the texture of the assembled avatar using a unified texture refinement process.
  • Figure 4: (a) Closed Surface Initialization: Expand and sew the open mesh (cropped via SMPL-X mask) to create a closed template mesh $M_{temp}$, used to initialize the SDF $s_{\theta_{a}}(\cdot)$ of ${\theta}_{a}$. (b) Open Surface Initialization: Fit a watertight pie mesh $M_{pie}$ over the holes of the cropped open mesh. Use its SDF values to initialize the mSDF $\hat{s}_{\theta_{a}}(\cdot)$ of ${\theta}_{a}$. (c) Comparison: Contrast geodesic-based and SDF-based mSDF initialization.
  • Figure 5: Diverse Range of Barbie-Style Avatar Generation. Rendering color images and normal images for visualization. Please zoom in to see the details and see Supp. Mat. for video results.
  • ...and 5 more figures