Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute
Chaoqun Gong, Yuqin Dai, Ronghui Li, Achun Bao, Jun Li, Jian Yang, Yachao Zhang, Xiu Li
TL;DR
Text2Avatar tackles the problem of generating realistic 3D human avatars from coupled multi-attribute text prompts, addressing feature coupling and data scarcity. It introduces a discrete codebook to connect text and avatar attributes and employs a Multi-Modal Encoder based on CLIP to decouple attributes during cross-modal generation. Training uses a pseudo-realistic 3D avatar data generator and an EVA3D-inspired pipeline to learn a mapping from discrete attribute codes to generator latents, enabling controllable synthesis. Experimental results show superior attribute accuracy and R-Precision compared with baselines, and ablations confirm the benefits of the codebook and segmentation components for realism and controllability.
Abstract
Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.
