DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Yukun Huang; Jianan Wang; Ailing Zeng; He Cao; Xianbiao Qi; Yukai Shi; Zheng-Jun Zha; Lei Zhang

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, Yukai Shi, Zheng-Jun Zha, Lei Zhang

TL;DR

DreamWaltz tackles the challenge of generating high-quality, animatable 3D avatars from text prompts by integrating a NeRF-based avatar representation with SMPL priors and 3D-aware skeleton conditioning for 3D-consistent supervision. It introduces a two-stage pipeline: canonical avatar creation via SMPL-guided initialization and 3D-consistent SDS, and animatable avatar learning by conditioning diffusion supervision on diverse pose priors to enable arbitrary pose animation without retraining. A density weighting network and pose-aware conditioning enable robust articulation and artifact suppression, while the method supports scene composition with avatar-avatar and avatar-object interactions. Extensive experiments demonstrate state-of-the-art quality in canonical avatars, robust animation capabilities, and practical scene-assembly potential for creative applications.

Abstract

We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior. While recent methods have shown encouraging results for text-to-3D generation of common objects, creating high-quality and animatable 3D avatars remains challenging. To create high-quality 3D avatars, DreamWaltz proposes 3D-consistent occlusion-aware Score Distillation Sampling (SDS) to optimize implicit neural representations with canonical poses. It provides view-aligned supervision via 3D-aware skeleton conditioning which enables complex avatar generation without artifacts and multiple faces. For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses, which could animate complex non-rigged avatars given arbitrary poses without retraining. Extensive evaluations demonstrate that DreamWaltz is an effective and robust approach for creating 3D avatars that can take on complex shapes and appearances as well as novel poses for animation. The proposed framework further enables the creation of complex scenes with diverse compositions, including avatar-avatar, avatar-object and avatar-scene interactions. See https://dreamwaltz3d.github.io/ for more vivid 3D avatar and animation results.

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

TL;DR

Abstract

Paper Structure (34 sections, 11 equations, 19 figures)

This paper contains 34 sections, 11 equations, 19 figures.

Introduction
Related Work
Text-guided image generation.
Text-guided 3D generation.
Text-guided 3D avatar generation.
Method
Preliminary
DreamWaltz: A Text-to-Avatar Generation Framework
Creating a Canonical Avatar
Learning an Animatable Avatar
Making a Scene with Animatable 3D Avatars
Experiment
Evaluation of Canonical Avatars
High-quality avatar generation.
Comparison with SOTA methods.
...and 19 more sections

Figures (19)

Figure 1: DreamWaltz is a text-to-3D-avatar generation framework, which can (a, b) create complex 3D animatable avatars from texts, (c, d) ready for 3D scene composition with diverse interactions.
Figure 2: Comparison of text-driven 3D avatar generation methods, including: AvatarCLIP avatarclip, AvatarCraft jiang2023avatarcraft, DreamAvatar cao2023dreamavatar and DreamWaltz (Ours). AvatarCLIP and AvatarCraft assume strong SMPL constraints, which makes it straightforward for the generated avatars to align with SMPL for animation. But due to the constraints, the avatars cannot take on complex shapes and appearances; With weak SMPL constraints, DreamAvatar struggles with wrong avatar geometry and requires retraining for each pose adjustment. Different from existing methods, DreamWaltz enables complex and animatable 3D avatar generation benefiting from the proposed SMPL-guided 3D-consistent SDS and deformation learning from human pose prior.
Figure 3: Illustration of our framework for canonical and animatable avatar creation. (a) shows how to create a canonical avatar from text with 3D-consistent occlusion-aware Score Distillation Sampling, and (b) demonstrates how to further learn an animatable avatar with sampled human pose prior.
Figure 4: Qualitative results from two views of DreamWaltz. Given text prompts, it can generate high-quality 3D avatars with complex geometry and texture.
Figure 5: Qualitative comparisons for complex avatar generation. Text inputs are listed below.
...and 14 more figures

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

TL;DR

Abstract

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Authors

TL;DR

Abstract

Table of Contents

Figures (19)