A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

Zhiyao Sun; Yu-Hui Wen; Ho-Jui Fang; Sheng Ye; Matthieu Lin; Tian Lv; Yong-Jin Liu

A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

Zhiyao Sun, Yu-Hui Wen, Ho-Jui Fang, Sheng Ye, Matthieu Lin, Tian Lv, Yong-Jin Liu

TL;DR

Tailor presents a training-free, integrated text-to-3D framework that generates CG-ready human avatars with physically compatible garments by decoupling body and clothing generation. It combines an LLM-driven semantic parsing stage, geometry-aware garment deformation with topology preservation, and a multi-view diffusion-based texture synthesis pipeline that enforces cross-view consistency and symmetry. The approach leverages HumGen3D for CG-ready body representations, Neural Jacobian Fields for body-aligned garment geometry, and synchronized diffusion with UV-space refinement for textures, achieving state-of-the-art fidelity, usability, and diversity. Quantitative and qualitative evaluations, including user studies, demonstrate superior performance over existing methods, and the system is designed to be extensible to alternative generators and prompts for production workflows.

Abstract

Creating detailed 3D human avatars with fitted garments traditionally requires specialized expertise and labor-intensive workflows. While recent advances in generative AI have enabled text-to-3D human and clothing synthesis, existing methods fall short in offering accessible, integrated pipelines for generating CG-ready 3D avatars with physically compatible outfits; here we use the term CG-ready for models following a technical aesthetic common in computer graphics (CG) and adopt standard CG polygonal meshes and strands representations (rather than neural representations like NeRF and 3DGS) that can be directly integrated into conventional CG pipelines and support downstream tasks such as physical simulation. To bridge this gap, we introduce Tailor, an integrated text-to-3D framework that generates high-fidelity, customizable 3D avatars dressed in simulation-ready garments. Tailor consists of three stages. (1) Seman tic Parsing: we employ a large language model to interpret textual descriptions and translate them into parameterized human avatars and semantically matched garment templates. (2) Geometry-Aware Garment Generation: we propose topology-preserving deformation with novel geometric losses to generate body-aligned garments under text control. (3) Consistent Texture Synthesis: we propose a novel multi-view diffusion process optimized for garment texturing, which enforces view consistency, preserves photorealistic details, and optionally supports symmetric texture generation common in garments. Through comprehensive quantitative and qualitative evaluations, we demonstrate that Tailor outperforms state-of-the-art methods in fidelity, usability, and diversity. Our code will be released for academic use. Project page: https://human-tailor.github.io

A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

TL;DR

Abstract

A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)