HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen
TL;DR
HeadArtist introduces Self Score Distillation (SSD), a landmark-guided ControlNet–based framework for text-conditioned 3D head generation that jointly optimizes geometry and texture on a DMTet+FLAME representation. By sampling two aligned score predictions from the same ControlNet conditioned on text and facial landmarks, SSD sidesteps common SDS/VSD issues such as over-saturation and Janus artifacts while embedding strong facial priors. The geometry and texture are generated sequentially and can be edited via natural language prompts, with negative prompts further improving texture realism. Across qualitative and quantitative evaluations, HeadArtist achieves superior fidelity and demonstrates robust editing capabilities, marking a step forward in high-fidelity, editable 3D head synthesis from language.
Abstract
This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
