DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, Yukai Shi, Zheng-Jun Zha, Lei Zhang
TL;DR
DreamWaltz tackles the challenge of generating high-quality, animatable 3D avatars from text prompts by integrating a NeRF-based avatar representation with SMPL priors and 3D-aware skeleton conditioning for 3D-consistent supervision. It introduces a two-stage pipeline: canonical avatar creation via SMPL-guided initialization and 3D-consistent SDS, and animatable avatar learning by conditioning diffusion supervision on diverse pose priors to enable arbitrary pose animation without retraining. A density weighting network and pose-aware conditioning enable robust articulation and artifact suppression, while the method supports scene composition with avatar-avatar and avatar-object interactions. Extensive experiments demonstrate state-of-the-art quality in canonical avatars, robust animation capabilities, and practical scene-assembly potential for creative applications.
Abstract
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior. While recent methods have shown encouraging results for text-to-3D generation of common objects, creating high-quality and animatable 3D avatars remains challenging. To create high-quality 3D avatars, DreamWaltz proposes 3D-consistent occlusion-aware Score Distillation Sampling (SDS) to optimize implicit neural representations with canonical poses. It provides view-aligned supervision via 3D-aware skeleton conditioning which enables complex avatar generation without artifacts and multiple faces. For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses, which could animate complex non-rigged avatars given arbitrary poses without retraining. Extensive evaluations demonstrate that DreamWaltz is an effective and robust approach for creating 3D avatars that can take on complex shapes and appearances as well as novel poses for animation. The proposed framework further enables the creation of complex scenes with diverse compositions, including avatar-avatar, avatar-object and avatar-scene interactions. See https://dreamwaltz3d.github.io/ for more vivid 3D avatar and animation results.
