Table of Contents
Fetching ...

TELA: Text to Layer-wise 3D Clothed Human Generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

TL;DR

TELA addresses text-to-3D clothed human generation by introducing a layer-wise representation (minimal body plus multiple clothes) and a progressive optimization pipeline. A transparency-based stratified rendering scheme prevents inter-layer penetration, while dual SDS losses enforce cloth decoupling from the body, enabling high-quality cloth editing and virtual try-on. Empirical results show state-of-the-art clothed-human generation with improved cloth details, view-consistency, and editing capabilities over holistic methods like DreamWaltz. The approach enables practical downstream tasks but requires substantial optimization time and remains NeRF-based, suggesting avenues for faster, mesh- or hybrid-representation extensions in the future.

Abstract

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/

TELA: Text to Layer-wise 3D Clothed Human Generation

TL;DR

TELA addresses text-to-3D clothed human generation by introducing a layer-wise representation (minimal body plus multiple clothes) and a progressive optimization pipeline. A transparency-based stratified rendering scheme prevents inter-layer penetration, while dual SDS losses enforce cloth decoupling from the body, enabling high-quality cloth editing and virtual try-on. Empirical results show state-of-the-art clothed-human generation with improved cloth details, view-consistency, and editing capabilities over holistic methods like DreamWaltz. The approach enables practical downstream tasks but requires substantial optimization time and remains NeRF-based, suggesting avenues for faster, mesh- or hybrid-representation extensions in the future.

Abstract

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/
Paper Structure (26 sections, 9 equations, 10 figures, 2 tables)

This paper contains 26 sections, 9 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Given textural descriptions (e.g., "a man wearing jeans, a denim shirt, and a windbreaker"), this paper aims to generate clothing-disentangled 3D human models progressively. Meanwhile, our approach enables high-quality 3D cloth generation and supports applications like cloth composition.
  • Figure 2: Overview of TELA. (a) Minimal-clothed body is the first component to generate. To train the body NeRF, we render the image and the corresponding 2D human skeleton under a random viewpoint and then utilize the 2D human skeleton conditioned ControlNet controlnet for SDS optimization. (b) Given the fixed body NeRF, we aim to progressively generate each cloth. For generating cloth $p$, we render an image of the human with cloth $p$ and another image of cloth $p$ only through the proposed transparency-based stratified compositional rendering. Then, the dual SDS losses are proposed to supervise these two images. For the cloth-only image, the original stable diffusion model is adopted for SDS optimization.
  • Figure 3: Qualitative comparisons with the holistic modeling method huang2023dreamwaltz. (a) DreamWaltz huang2023dreamwaltz, (b) Ours. Text prompts (from left to right): "A woman wearing a Brown Cycling Top and Brown Chiffon Skirt", "A Black woman wearing a blue turtleneck and blue Midi Skirt", "An Asian man wearing a Light Blue Varsity Jacket and Western Pants", "A Black man wearing a Brown Sweatshirt and jeans"
  • Figure 4: Qualitative results of the proposed method. Text prompts: "A Black man wearing Khaki Outerwear and denim shorts", "An Asian man wearing a Navy Blue Military Jacket and jeans", "A Black woman wearing a blue turtleneck and Sporty Skirt", "A woman wearing Black Dress".
  • Figure 5: Qualitative comparisons of cloth generation with the SOTA methods. Text prompts (from top to bottom): "a pair of jeans", "a piece of Khaki Sleeveless Dress", "a Brown Sweatshirt".
  • ...and 5 more figures