Table of Contents
Fetching ...

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

TL;DR

The paper addresses automatic, topology-preserving 4D head capture from calibrated multi-view videos. It introduces Topo4D, a Gaussian Mesh representation that ties fixed topology Gaussians to mesh vertices and alternates geometry and texture optimization with topology/physical priors. It adds UV densification to learn ultra-high-resolution textures, including pore-level details, and Gaussian normal expansion for accurate geometry extraction. Experiments on a 16-camera Light Stage dataset show superior mesh accuracy and 8K textures compared to SOTA methods, with robust temporal stability and efficient, automated processing.

Abstract

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: https://xuanchenli.github.io/Topo4D/.

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

TL;DR

The paper addresses automatic, topology-preserving 4D head capture from calibrated multi-view videos. It introduces Topo4D, a Gaussian Mesh representation that ties fixed topology Gaussians to mesh vertices and alternates geometry and texture optimization with topology/physical priors. It adds UV densification to learn ultra-high-resolution textures, including pore-level details, and Gaussian normal expansion for accurate geometry extraction. Experiments on a 16-camera Light Stage dataset show superior mesh accuracy and 8K textures compared to SOTA methods, with robust temporal stability and efficient, automated processing.

Abstract

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: https://xuanchenli.github.io/Topo4D/.
Paper Structure (35 sections, 17 equations, 18 figures, 3 tables)

This paper contains 35 sections, 17 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: Example results of our Topo4D . Our method can produce temporal-consistent topological head meshes with high-fidelity 8K textures from calibrated multi-view videos. Captured 4D models can be applied to retargeting and relighting applications.
  • Figure 2: Overall pipeline of our framework. (a) We initialize Gaussian attributes and establish topological correspondence with the startup mesh. (b) Take one frame as an example, geometry-related attributes in the Gaussian Mesh of the last frame are optimized by this frame under a set of topology-aware loss items. (c) We align the Gaussian surface with the rendering surface by Gaussian normal expansion to extract more precise meshes. (d) To learn pore-level detailed colors and generate ultra-high resolution texture, we build a dense mesh by densifying Gaussians in UV space.
  • Figure 3: Qualitative evaluation of meshes generated by our method and other topology-consistent reconstruction methods. We use artist-manually registered head mesh as the ground truth. We highlight areas that are difficult to reconstruct.
  • Figure 4: Qualitative evaluation of the rendering results between our method, UnsupTex UnsupTex and HRN HRN. The generated 8K textures and pore-level details are demonstrated in columns 5 and 6.
  • Figure 5: Comparisons of temporal stability on a sequence in our dataset, and our method is bolded and indicated by red arrows. (a) The curves of log(RMSE) (lower is better) of several topology-consistent face reconstruction methods. (b) The curves of PSNR (higher is better) are calculated between textures of adjacent frames.
  • ...and 13 more figures