Table of Contents
Fetching ...

ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars

Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du

TL;DR

ArchitectHead tackles the problem of enabling continuous, run-time level-of-detail (LOD) control for 3D Gaussian head avatars. It introduces a UV-based representation where Gaussians are parameterized in a 2D UV feature space and supported by a multi-level UV feature field and a lightweight decoder, allowing LOD to be adjusted without retraining by resampling UV features across resolutions. The method uses a FLAME-driven initialization, a UV position map, a driving code from expression codes, and a five-branch decoder to produce per-Gaussian attributes, with a two-stage training regime to ensure robustness across LODs. Empirical results on monocular video datasets show state-of-the-art quality at the highest LOD and near-SOTA performance at lower LODs, with substantial speed-ups and significant reduction in Gaussian counts at the lowest LOD, highlighting strong practical impact for real-time digital humans.

Abstract

3D Gaussian Splatting (3DGS) has enabled photorealistic and real-time rendering of 3D head avatars. Existing 3DGS-based avatars typically rely on tens of thousands of 3D Gaussian points (Gaussians), with the number of Gaussians fixed after training. However, many practical applications require adjustable levels of detail (LOD) to balance rendering efficiency and visual quality. In this work, we propose "ArchitectHead", the first framework for creating 3D Gaussian head avatars that support continuous control over LOD. Our key idea is to parameterize the Gaussians in a 2D UV feature space and propose a UV feature field composed of multi-level learnable feature maps to encode their latent features. A lightweight neural network-based decoder then transforms these latent features into 3D Gaussian attributes for rendering. ArchitectHead controls the number of Gaussians by dynamically resampling feature maps from the UV feature field at the desired resolutions. This method enables efficient and continuous control of LOD without retraining. Experimental results show that ArchitectHead achieves state-of-the-art (SOTA) quality in self and cross-identity reenactment tasks at the highest LOD, while maintaining near SOTA performance at lower LODs. At the lowest LOD, our method uses only 6.2\% of the Gaussians while the quality degrades moderately (L1 Loss +7.9\%, PSNR --0.97\%, SSIM --0.6\%, LPIPS Loss +24.1\%), and the rendering speed nearly doubles.

ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars

TL;DR

ArchitectHead tackles the problem of enabling continuous, run-time level-of-detail (LOD) control for 3D Gaussian head avatars. It introduces a UV-based representation where Gaussians are parameterized in a 2D UV feature space and supported by a multi-level UV feature field and a lightweight decoder, allowing LOD to be adjusted without retraining by resampling UV features across resolutions. The method uses a FLAME-driven initialization, a UV position map, a driving code from expression codes, and a five-branch decoder to produce per-Gaussian attributes, with a two-stage training regime to ensure robustness across LODs. Empirical results on monocular video datasets show state-of-the-art quality at the highest LOD and near-SOTA performance at lower LODs, with substantial speed-ups and significant reduction in Gaussian counts at the lowest LOD, highlighting strong practical impact for real-time digital humans.

Abstract

3D Gaussian Splatting (3DGS) has enabled photorealistic and real-time rendering of 3D head avatars. Existing 3DGS-based avatars typically rely on tens of thousands of 3D Gaussian points (Gaussians), with the number of Gaussians fixed after training. However, many practical applications require adjustable levels of detail (LOD) to balance rendering efficiency and visual quality. In this work, we propose "ArchitectHead", the first framework for creating 3D Gaussian head avatars that support continuous control over LOD. Our key idea is to parameterize the Gaussians in a 2D UV feature space and propose a UV feature field composed of multi-level learnable feature maps to encode their latent features. A lightweight neural network-based decoder then transforms these latent features into 3D Gaussian attributes for rendering. ArchitectHead controls the number of Gaussians by dynamically resampling feature maps from the UV feature field at the desired resolutions. This method enables efficient and continuous control of LOD without retraining. Experimental results show that ArchitectHead achieves state-of-the-art (SOTA) quality in self and cross-identity reenactment tasks at the highest LOD, while maintaining near SOTA performance at lower LODs. At the lowest LOD, our method uses only 6.2\% of the Gaussians while the quality degrades moderately (L1 Loss +7.9\%, PSNR --0.97\%, SSIM --0.6\%, LPIPS Loss +24.1\%), and the rendering speed nearly doubles.

Paper Structure

This paper contains 20 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: ArchitectHead supports continuous LOD control ranging from 0 (highest) to 1.0 (lowest). This figure shows the rendered image quality under different LOD settings. For each case, we provide a zoom-in view of selected regions. We recommend viewing the figure on a digital device with zoom for better inspection. The grey dots indicate the positions of the 3D Gaussian points. The test identity is taken from the NeRFace dataset gafni2021dynamic. Red arrow indicates visible artifacts (sparse Gaussians) in the lowest LOD.
  • Figure 2: Pipeline of ArchitectHead. We propose a 3D Gaussian head avatar creation method with continuous level of detail (LOD) control. Starting from shape and expression codes, we use the FLAME head model to generate the 3D mesh geometry, which is rasterized into a UV position map at the desired resolution. A multi-level UV feature field is introduced to learn local latent features, from which our weighted resampler extracts a UV feature map of the target resolution. This map is concatenated with the UV position map, the desired LOD value, and a driving code obtained from expression and pose codes via an MLP network $\mathcal{M}$. The resulting pixel-wise latent features are decoded by an MLP-based decoder into 3D Gaussian attributes, which are rendered using 3D Gaussian Splatting (3DGS).
  • Figure 3: Qualitative comparisons of self-reenactment results. Selected regions are zoomed in for clearer comparison of fine details. The last two columns show our method with the highest (LOD=0.0) and lowest (LOD=1.0) settings. Compared to existing methods, our approach preserves finer details at the highest LOD while also maintaining reasonable quality at the lowest LOD.
  • Figure 4: Rendered novel views. The reference images (left-most) are rendered with the default camera pose, while the other images are rendered with yaw or pitch angle offsets using an orbit camera.
  • Figure 5: Qualitative results of cross-identity reenactment. The first column shows the source images that provide the expression codes and camera poses. The remaining columns in each row present the trained head avatar of another individual reenacted using the source codes and poses.
  • ...and 2 more figures