X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji
TL;DR
X-Oscar addresses oversaturation and quality gaps in text-guided 3D avatar generation by proposing a progressive Geometry->Texture->Animation framework anchored on SMPL-X as prior. It introduces Adaptive Variational Parameter to represent avatars as adaptive distributions, mitigating oversaturation, and Avatar-aware Score Distillation Sampling to inject geometry- and appearance-aware noise during diffusion-based optimization. Through comprehensive experiments against state-of-the-art text-to-3D and text-to-avatar methods, X-Oscar demonstrates superior geometry, texture fidelity, and full animatability, enabling high-fidelity avatars for gaming, AR/VR, and digital media. The framework offers a practical, automated path from text prompts to high-quality, editable 3D avatars with robust optimization dynamics.
Abstract
Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplifying optimization through step-by-step generation. To tackle oversaturation, we introduce Adaptive Variational Parameter (AVP), representing avatars as an adaptive distribution during training. Additionally, we present Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into rendered images for improved generation quality during optimization. Extensive evaluations confirm the superiority of X-Oscar over existing text-to-3D and text-to-avatar approaches. Our anonymous project page: https://xmu-xiaoma666.github.io/Projects/X-Oscar/.
