Table of Contents
Fetching ...

STGA: Selective-Training Gaussian Head Avatars

Hanzhi Guo, Yixiao Chen, Dongye Xiaonuo, Zeyu Tian, Dongdong Weng, Le Luo

TL;DR

This work tackles high-fidelity, drivable head avatars by introducing Selective-Training Gaussian Head Avatars (STGA), which embeds 3D Gaussian splats on the FLAME head mesh and optimizes only frame-specific splats to enhance fine details. The method alternates local optimization of selected Gaussians with periodic global refinement, and employs batch training to manage memory efficiently, achieving faster training than network-based approaches while delivering richer detail than purely mesh-based methods. Quantitative and qualitative evaluations on the NeRSemble dataset show improved rendering quality, particularly in eyes and teeth, and ablation confirms the benefit of selective training. The approach offers a practical path to realistic, editable dynamic head avatars with improved training efficiency, while acknowledging limitations in hair/teeth representation and suggesting extensions to non-FLAME meshes in future work.

Abstract

We propose selective-training Gaussian head avatars (STGA) to enhance the details of dynamic head Gaussian. The dynamic head Gaussian model is trained based on the FLAME parameterized model. Each Gaussian splat is embedded within the FLAME mesh to achieve mesh-based animation of the Gaussian model. Before training, our selection strategy calculates the 3D Gaussian splat to be optimized in each frame. The parameters of these 3D Gaussian splats are optimized in the training of each frame, while those of the other splats are frozen. This means that the splats participating in the optimization process differ in each frame, to improve the realism of fine details. Compared with network-based methods, our method achieves better results with shorter training time. Compared with mesh-based methods, our method produces more realistic details within the same training time. Additionally, the ablation experiment confirms that our method effectively enhances the quality of details.

STGA: Selective-Training Gaussian Head Avatars

TL;DR

This work tackles high-fidelity, drivable head avatars by introducing Selective-Training Gaussian Head Avatars (STGA), which embeds 3D Gaussian splats on the FLAME head mesh and optimizes only frame-specific splats to enhance fine details. The method alternates local optimization of selected Gaussians with periodic global refinement, and employs batch training to manage memory efficiently, achieving faster training than network-based approaches while delivering richer detail than purely mesh-based methods. Quantitative and qualitative evaluations on the NeRSemble dataset show improved rendering quality, particularly in eyes and teeth, and ablation confirms the benefit of selective training. The approach offers a practical path to realistic, editable dynamic head avatars with improved training efficiency, while acknowledging limitations in hair/teeth representation and suggesting extensions to non-FLAME meshes in future work.

Abstract

We propose selective-training Gaussian head avatars (STGA) to enhance the details of dynamic head Gaussian. The dynamic head Gaussian model is trained based on the FLAME parameterized model. Each Gaussian splat is embedded within the FLAME mesh to achieve mesh-based animation of the Gaussian model. Before training, our selection strategy calculates the 3D Gaussian splat to be optimized in each frame. The parameters of these 3D Gaussian splats are optimized in the training of each frame, while those of the other splats are frozen. This means that the splats participating in the optimization process differ in each frame, to improve the realism of fine details. Compared with network-based methods, our method achieves better results with shorter training time. Compared with mesh-based methods, our method produces more realistic details within the same training time. Additionally, the ablation experiment confirms that our method effectively enhances the quality of details.

Paper Structure

This paper contains 15 sections, 10 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of our method. The top-left section illustrates Gaussian initialization, where a local coordinate system is established for each triangle of the neutral mesh of the target head. The bottom-left section shows the FLAME tracking process and the determination of Gaussian splats to be optimized for each frame. The training process consists of two parts: local training and global training. Local training optimizes only the selected Gaussian splats while freezing the others, whereas global training involves all Gaussian splats. These two training modes are alternated to enhance the realism of details while maintaining overall consistency. This process ultimately produces a high-quality dynamic Gaussian model of the head.
  • Figure 2: Initializing the Gaussian head model method
  • Figure 3: Different Gaussian splats are selected for training when training different expressions. $\mathit{\psi}_{0} \dots \mathit{\psi}_{i}$ represent different expressions.
  • Figure 4: Defined key facial regions for Gaussian splat optimization (from left to right: Mouth, Nose, Right Eye, Left Eye
  • Figure 5: Visualization of relative displacements of triangle centers for the FLAME model under different expressions
  • ...and 3 more figures