PAV: Personalized Head Avatar from Unstructured Video Collection
Akin Caliskan, Berkay Kicanaoglu, Hyeongwoo Kim
TL;DR
PAV addresses the challenge of building a personalized head avatar from unstructured monocular videos that show the same subject with multiple appearances. It introduces a single unified dynamic deformable NeRF conditioned on per-appearance latent features attached to a geometry-aware head mesh, leveraging a shared canonical space and appearance-conditioned density and color fields. The approach uses a FLAME-based head model for geometry, learns latent appearance embeddings $Z_j$, and employs a density offset $\Delta_{\sigma}$ to capture appearance-specific geometry and texture, achieving superior novel-pose/novel-expression renderings across appearances. This enables realistic, controllable head avatars from unconstrained videos, with broad implications for telepresence and animation, while acknowledging limitations for multi-identity scaling and ethical concerns around misuse.
Abstract
We propose PAV, Personalized Head Avatar for the synthesis of human faces under arbitrary viewpoints and facial expressions. PAV introduces a method that learns a dynamic deformable neural radiance field (NeRF), in particular from a collection of monocular talking face videos of the same character under various appearance and shape changes. Unlike existing head NeRF methods that are limited to modeling such input videos on a per-appearance basis, our method allows for learning multi-appearance NeRFs, introducing appearance embedding for each input video via learnable latent neural features attached to the underlying geometry. Furthermore, the proposed appearance-conditioned density formulation facilitates the shape variation of the character, such as facial hair and soft tissues, in the radiance field prediction. To the best of our knowledge, our approach is the first dynamic deformable NeRF framework to model appearance and shape variations in a single unified network for multi-appearances of the same subject. We demonstrate experimentally that PAV outperforms the baseline method in terms of visual rendering quality in our quantitative and qualitative studies on various subjects.
