HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang
TL;DR
HeadEvolver tackles the editing bottleneck in text-driven head-avatar generation by switching from implicit representations to an explicit template-mesh deformation driven by per-face Jacobians $J_f$ augmented with a learnable vector field $H_f$. Deformations are computed via a Poisson formulation and guided by a score distillation loss from 2D diffusion priors, with landmark and evolving contour regularizers to preserve 3D attributes such as landmarks, rig, and UVs. The method achieves expressive, texture-rich head avatars that remain editable in standard graphics tools and compatible with downstream animation, without requiring training data. Experiments demonstrate diverse, high-quality meshes with preserved topology and more faithful identity features, validated by quantitative CLIP-based metrics and user studies.
Abstract
Current text-to-avatar methods often rely on implicit representations (e.g., NeRF, SDF, and DMTet), leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated avatars from multiple views without relying on any specific shape prior. Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in graphics software, facilitating downstream applications such as efficient asset creation and animation with preserved attributes.
