Table of Contents
Fetching ...

UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

Yujiao Jiang, Qingmin Liao, Xiaoyu Li, Li Ma, Qi Zhang, Chaopeng Zhang, Zongqing Lu, Ying Shan

TL;DR

UV Gaussians address the challenge of creating photo-realistic, animatable human avatars by jointly learning pose-dependent mesh deformations and UV-space Gaussian textures guided by a refined template mesh. The method combines a Mesh U-Net for geometry with a Gaussian U-Net operating in UV space to produce high-fidelity Gaussian textures, then animates Gaussians under mesh guidance using differentiable rendering. Extensive experiments on a newly collected multi-view motion dataset show state-of-the-art performance for both novel view and novel pose synthesis, with ablations confirming the value of mesh guidance and the loss terms. This approach offers a practical path to efficient, high-quality human avatars suitable for real-time applications, while acknowledging dependence on scanned meshes and areas with challenging clothing.

Abstract

Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose. The code and data will be made available on the homepage https://alex-jyj.github.io/UV-Gaussians/ once the paper is accepted.

UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

TL;DR

UV Gaussians address the challenge of creating photo-realistic, animatable human avatars by jointly learning pose-dependent mesh deformations and UV-space Gaussian textures guided by a refined template mesh. The method combines a Mesh U-Net for geometry with a Gaussian U-Net operating in UV space to produce high-fidelity Gaussian textures, then animates Gaussians under mesh guidance using differentiable rendering. Extensive experiments on a newly collected multi-view motion dataset show state-of-the-art performance for both novel view and novel pose synthesis, with ablations confirming the value of mesh guidance and the loss terms. This approach offers a practical path to efficient, high-quality human avatars suitable for real-time applications, while acknowledging dependence on scanned meshes and areas with challenging clothing.

Abstract

Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose. The code and data will be made available on the homepage https://alex-jyj.github.io/UV-Gaussians/ once the paper is accepted.
Paper Structure (26 sections, 11 equations, 9 figures, 5 tables)

This paper contains 26 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Based on the SMPL mesh and its UV mapping, we learn pose-dependent refined mesh and its Gaussian textures. By combining the advantages of high-quality rendering from Gaussian Splatting and easy animation of template mesh, our method could produce photo-realistic human avatars.
  • Figure 2: Overview of our method, which comprises three primary modules: a Mesh U-Net for learning pose-dependent mesh deformation, a Gaussian U-Net for learning pose-dependent Gaussian textures, and a Mesh-Guided 3D Gaussian Animation for animating the Gaussian guided by the mesh.
  • Figure 3: Example of mesh deformation. By optimizing the vertices offsets, our method could achieve a refined mesh with more accurate geometry while keeping the topology consistency with SMPL-X for animation. This refined mesh can be used to guide the rendering of 3D Gaussians, resulting in photorealistic results.
  • Figure 4: Qualitative comparisons on novel view and novel pose synthesis.
  • Figure 5: Novel view and pose synthesis results of the ablation experiments. The full model exhibits the least artifacts compared with using different mesh guidance or not using any mesh.
  • ...and 4 more figures