Multimodal Generation of Animatable 3D Human Models with AvatarForge

Xinhang Liu; Yu-Wing Tai; Chi-Keung Tang

Multimodal Generation of Animatable 3D Human Models with AvatarForge

Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

TL;DR

AvatarForge addresses the challenge of generating realistic, animatable 3D human avatars from text or image prompts, a task where diffusion-based methods struggle due to body diversity and animation fidelity. It combines a large language model (LLM) agent for procedural generation with off-the-shelf 3D human generators, an auto-verification agent for iterative refinement, and a motion-control agent to animate avatars via natural language. The dynamic manual guides the LLM through high-dimensional parameter spaces, enabling fine-grained control over body shape, facial features, clothing, and poses. Experimental results show AvatarForge outperforms state-of-the-art text- and image-to-avatar methods in quality and customization, with interactive editing and real-time animation capabilities that promise broad applicability in gaming, film, and virtual environments.

Abstract

We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.

Multimodal Generation of Animatable 3D Human Models with AvatarForge

TL;DR

Abstract

Multimodal Generation of Animatable 3D Human Models with AvatarForge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)