FAGhead: Fully Animate Gaussian Head from Monocular Videos
Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan
TL;DR
FAGhead tackles monocular 3D head avatar reconstruction by decoupling identity and expression within a FLAME-based parametric head model and introducing a Point-based Learnable Representation Field (PLRF) of Gaussian points. A Transform Network deforms canonical PLRF geometry to frame-specific configurations, while alpha rendering with a dedicated loss enforces edge-accurate geometry, reducing artifacts on hair and shoulders. The PLRF densifies the facial representation by placing Gaussian points along triangle midlines with a learnable parameter $n \in [0,1]$, and adaptive density control enables dynamic refinement during training; an MLP $F_{\theta}$ predicts spatial residuals $\delta\mu_i$, $\delta s_i$, $\delta r_i$ conditioned on FLAME properties $\rho_i$. Extensive experiments on open datasets and captured data show state-of-the-art fidelity in reconstruction, robust novel-view synthesis, and realistic cross-identity reenactment, outperforming INSTA, GaussianAvatars, and FlashAvatar. This approach enables high-quality, controllable head avatars from monocular video with practical implications for VR, social communication, and digital human creation, while acknowledging limitations in oral cavity modeling and preprocessing sensitivity.
Abstract
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.
