Table of Contents
Fetching ...

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng, Youyi Zheng, Kun Zhou

TL;DR

RGBAvatar introduces a subject-adaptive reduced Gaussian blendshape representation that maps FLAME parameters to a compact set of Gaussian blendshapes via an MLP, enabling high-fidelity head avatars with far fewer bases. By combining color initialization, batch-parallel Gaussian rasterization, and a local-global online sampling strategy, the approach achieves real-time, on-the-fly reconstruction and rendering speeds, demonstrated on monocular video with near-offline quality. The work shows substantial gains in training throughput (~630 images/s) and rendering performance (≈400 FPS) while maintaining expressive fidelity, outperforming prior Gaussian-based methods. These contributions enable practical, interactive head-avatar reconstruction for streaming and telepresence applications.

Abstract

We present Reduced Gaussian Blendshapes Avatar (RGBAvatar), a method for reconstructing photorealistic, animatable head avatars at speeds sufficient for on-the-fly reconstruction. Unlike prior approaches that utilize linear bases from 3D morphable models (3DMM) to model Gaussian blendshapes, our method maps tracked 3DMM parameters into reduced blendshape weights with an MLP, leading to a compact set of blendshape bases. The learned compact base composition effectively captures essential facial details for specific individuals, and does not rely on the fixed base composition weights of 3DMM, leading to enhanced reconstruction quality and higher efficiency. To further expedite the reconstruction process, we develop a novel color initialization estimation method and a batch-parallel Gaussian rasterization process, achieving state-of-the-art quality with training throughput of about 630 images per second. Moreover, we propose a local-global sampling strategy that enables direct on-the-fly reconstruction, immediately reconstructing the model as video streams in real time while achieving quality comparable to offline settings. Our source code is available at https://github.com/gapszju/RGBAvatar.

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

TL;DR

RGBAvatar introduces a subject-adaptive reduced Gaussian blendshape representation that maps FLAME parameters to a compact set of Gaussian blendshapes via an MLP, enabling high-fidelity head avatars with far fewer bases. By combining color initialization, batch-parallel Gaussian rasterization, and a local-global online sampling strategy, the approach achieves real-time, on-the-fly reconstruction and rendering speeds, demonstrated on monocular video with near-offline quality. The work shows substantial gains in training throughput (~630 images/s) and rendering performance (≈400 FPS) while maintaining expressive fidelity, outperforming prior Gaussian-based methods. These contributions enable practical, interactive head-avatar reconstruction for streaming and telepresence applications.

Abstract

We present Reduced Gaussian Blendshapes Avatar (RGBAvatar), a method for reconstructing photorealistic, animatable head avatars at speeds sufficient for on-the-fly reconstruction. Unlike prior approaches that utilize linear bases from 3D morphable models (3DMM) to model Gaussian blendshapes, our method maps tracked 3DMM parameters into reduced blendshape weights with an MLP, leading to a compact set of blendshape bases. The learned compact base composition effectively captures essential facial details for specific individuals, and does not rely on the fixed base composition weights of 3DMM, leading to enhanced reconstruction quality and higher efficiency. To further expedite the reconstruction process, we develop a novel color initialization estimation method and a batch-parallel Gaussian rasterization process, achieving state-of-the-art quality with training throughput of about 630 images per second. Moreover, we propose a local-global sampling strategy that enables direct on-the-fly reconstruction, immediately reconstructing the model as video streams in real time while achieving quality comparable to offline settings. Our source code is available at https://github.com/gapszju/RGBAvatar.

Paper Structure

This paper contains 27 sections, 5 equations, 20 figures, 8 tables, 2 algorithms.

Figures (20)

  • Figure 1: Our RGBAvatar reconstructs a high-fidelity head avatar from a 2-minute monocular video in about 80 seconds, using a reduced set of Gaussian blendshapes. These blendshapes are linearly combined to generate avatar animations in real time at about 400 FPS.
  • Figure 2: Pipeline. RGBAvatar represents the head avatar with a base model $G_0$ and a reduced set of Gaussian blendshapes $\{\Delta G_i\}_{i=1}^{K}$, each parametized as Gaussian attributes. For an input video frame $I_t$, we first track the FLAME parameters $\theta$ and generate FLAME mesh $M^{\theta}$. Then, an MLP $\mathcal{F}$ is used to map the FLAME parameters $\theta$ to the reduced blendshape weights $\psi$. The Gaussian model $G^{\psi}$ of the animated avatar is generated through linear blending with $\psi$. Finally, Gaussians are transformed into the deformed space for rendering according to the deformation of mesh triangles.
  • Figure 3: Impact of number of blendshapes on reconstruction quality (left) and training time (right). We select 20 blendshapes to balance reconstruction quality with efficiency. Our method outperforms GaussianBlendShapes ma20243d using fewer number of blendshapes. Experiments are conducted on INSTA zielonka2023instant dataset.
  • Figure 4: Effect of color initialization. This strategy is only applied once for each Gaussian during optimization. Left: results of our color initialization. Right: our color initialization strategy accelerates convergence speed in the early stage of training.
  • Figure 5: Illustration of our batch-parallel Gaussian rasterization. We take batch size = 3 as an example. a) Optimizing a single data sample per step, as used in xiang2024flashavatarshao2024splattingavatarma20243d, results in suboptimal GPU utilization. b) The naive batch-parallel approach, as used in chen2024monogaussianavatar, suffers from frequent synchronization overhead. c) Our batch-parallel Gaussian rasterization mitigates synchronization issues and maximizes parallelism by leveraging CUDA streams.
  • ...and 15 more figures