Table of Contents
Fetching ...

A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

Zhiyao Sun, Yu-Hui Wen, Ho-Jui Fang, Sheng Ye, Matthieu Lin, Tian Lv, Yong-Jin Liu

TL;DR

Tailor presents a training-free, integrated text-to-3D framework that generates CG-ready human avatars with physically compatible garments by decoupling body and clothing generation. It combines an LLM-driven semantic parsing stage, geometry-aware garment deformation with topology preservation, and a multi-view diffusion-based texture synthesis pipeline that enforces cross-view consistency and symmetry. The approach leverages HumGen3D for CG-ready body representations, Neural Jacobian Fields for body-aligned garment geometry, and synchronized diffusion with UV-space refinement for textures, achieving state-of-the-art fidelity, usability, and diversity. Quantitative and qualitative evaluations, including user studies, demonstrate superior performance over existing methods, and the system is designed to be extensible to alternative generators and prompts for production workflows.

Abstract

Creating detailed 3D human avatars with fitted garments traditionally requires specialized expertise and labor-intensive workflows. While recent advances in generative AI have enabled text-to-3D human and clothing synthesis, existing methods fall short in offering accessible, integrated pipelines for generating CG-ready 3D avatars with physically compatible outfits; here we use the term CG-ready for models following a technical aesthetic common in computer graphics (CG) and adopt standard CG polygonal meshes and strands representations (rather than neural representations like NeRF and 3DGS) that can be directly integrated into conventional CG pipelines and support downstream tasks such as physical simulation. To bridge this gap, we introduce Tailor, an integrated text-to-3D framework that generates high-fidelity, customizable 3D avatars dressed in simulation-ready garments. Tailor consists of three stages. (1) Seman tic Parsing: we employ a large language model to interpret textual descriptions and translate them into parameterized human avatars and semantically matched garment templates. (2) Geometry-Aware Garment Generation: we propose topology-preserving deformation with novel geometric losses to generate body-aligned garments under text control. (3) Consistent Texture Synthesis: we propose a novel multi-view diffusion process optimized for garment texturing, which enforces view consistency, preserves photorealistic details, and optionally supports symmetric texture generation common in garments. Through comprehensive quantitative and qualitative evaluations, we demonstrate that Tailor outperforms state-of-the-art methods in fidelity, usability, and diversity. Our code will be released for academic use. Project page: https://human-tailor.github.io

A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

TL;DR

Tailor presents a training-free, integrated text-to-3D framework that generates CG-ready human avatars with physically compatible garments by decoupling body and clothing generation. It combines an LLM-driven semantic parsing stage, geometry-aware garment deformation with topology preservation, and a multi-view diffusion-based texture synthesis pipeline that enforces cross-view consistency and symmetry. The approach leverages HumGen3D for CG-ready body representations, Neural Jacobian Fields for body-aligned garment geometry, and synchronized diffusion with UV-space refinement for textures, achieving state-of-the-art fidelity, usability, and diversity. Quantitative and qualitative evaluations, including user studies, demonstrate superior performance over existing methods, and the system is designed to be extensible to alternative generators and prompts for production workflows.

Abstract

Creating detailed 3D human avatars with fitted garments traditionally requires specialized expertise and labor-intensive workflows. While recent advances in generative AI have enabled text-to-3D human and clothing synthesis, existing methods fall short in offering accessible, integrated pipelines for generating CG-ready 3D avatars with physically compatible outfits; here we use the term CG-ready for models following a technical aesthetic common in computer graphics (CG) and adopt standard CG polygonal meshes and strands representations (rather than neural representations like NeRF and 3DGS) that can be directly integrated into conventional CG pipelines and support downstream tasks such as physical simulation. To bridge this gap, we introduce Tailor, an integrated text-to-3D framework that generates high-fidelity, customizable 3D avatars dressed in simulation-ready garments. Tailor consists of three stages. (1) Seman tic Parsing: we employ a large language model to interpret textual descriptions and translate them into parameterized human avatars and semantically matched garment templates. (2) Geometry-Aware Garment Generation: we propose topology-preserving deformation with novel geometric losses to generate body-aligned garments under text control. (3) Consistent Texture Synthesis: we propose a novel multi-view diffusion process optimized for garment texturing, which enforces view consistency, preserves photorealistic details, and optionally supports symmetric texture generation common in garments. Through comprehensive quantitative and qualitative evaluations, we demonstrate that Tailor outperforms state-of-the-art methods in fidelity, usability, and diversity. Our code will be released for academic use. Project page: https://human-tailor.github.io

Paper Structure

This paper contains 18 sections, 7 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: We introduce Tailor, an integrated text-to-3D framework that generates high-fidelity, customizable 3D humans with simulation-ready garments.
  • Figure 1: Additional qualitative comparison with state-of-the-art text-to-3D human generation methods.
  • Figure 2: Overview of Tailor. Tailor includes a three-stage pipeline. Given a description of a clothed human, (a) an LLM agent decomposes the prompt into separate body and garment sub-prompts, then outputs translated body parameters and garment templates. The body parameters and garment templates are subsequently fed into the HumGen3D generator to derive a highly-detailed human model and a set of roughly-aligned garments. (b) The framework applies topology-preserving deformation to the Neural Jacobian Field of garment template under the guidance of both a text-to-image model and geometry constraints to generate body-aligned clothes. (c) We condition a multi-view image diffusion model with rendered depth images to generate view-consistent images by introducing a local attention mechanism. Then, textures are sampled from the images and refined to get the final texture map.
  • Figure 2: Additional qualitative comparison with state-of-the-art text-to-3D garment generation methods.
  • Figure 3: Starting from the same template, our topology-preserving deformation process generates diverse and plausible garment geometries conditioned on different target prompts.
  • ...and 16 more figures