Table of Contents
Fetching ...

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Zhe Li, Yipengjing Sun, Zerong Zheng, Lizhen Wang, Shengping Zhang, Yebin Liu

TL;DR

This work tackles lifelike animatable human avatar modeling from RGB videos by introducing Animatable Gaussians, an explicit 3D Gaussian splatting framework coupled with 2D CNNs. A character-specific parametric template is learned and projected onto front/back Gaussian maps, enabling high-fidelity pose-dependent dynamics while enabling efficient rendering. The method further incorporates a PCA-based pose projection for novel poses and a physically-based rendering pipeline to disentangle geometry, material, and lighting for realistic relighting under new illumination. Experiments across THuman4.0, AvatarReX, and ActorsHQ demonstrate superior animation quality and relighting fidelity compared with NeRF-based and prior Gaussian-based avatars, particularly in modeling loose garments like dresses. This approach advances practical 3D human avatars for holoportation and extended reality by delivering lifelike, relightable, and generalized appearances with efficient rendering.

Abstract

Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

TL;DR

This work tackles lifelike animatable human avatar modeling from RGB videos by introducing Animatable Gaussians, an explicit 3D Gaussian splatting framework coupled with 2D CNNs. A character-specific parametric template is learned and projected onto front/back Gaussian maps, enabling high-fidelity pose-dependent dynamics while enabling efficient rendering. The method further incorporates a PCA-based pose projection for novel poses and a physically-based rendering pipeline to disentangle geometry, material, and lighting for realistic relighting under new illumination. Experiments across THuman4.0, AvatarReX, and ActorsHQ demonstrate superior animation quality and relighting fidelity compared with NeRF-based and prior Gaussian-based avatars, particularly in modeling loose garments like dresses. This approach advances practical 3D human avatars for holoportation and extended reality by delivering lifelike, relightable, and generalized appearances with efficient rendering.

Abstract

Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.
Paper Structure (48 sections, 17 equations, 17 figures, 9 tables)

This paper contains 48 sections, 17 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Lifelike relightable and animatable avatars with highly dynamic, realistic and generalized details created by our method. We show synthesized results animated by the same pose under the capture environment and novel lights.
  • Figure 2: Illustration of the avatar modeling pipeline. It contains two main steps: 1) Reconstruct a character-specific template from multi-view images. 2) Predict pose-dependent Gaussian and intrinsic maps through StyleUNets, and render the posed Gaussians by Gaussian splatting and physically-based rendering to learn both pose-dependent dynamics and avatar materials. Finally, given a novel environment light, we can animate the avatar with realistic dynamic appearances and shadow effects.
  • Figure 3: Illustration of the posed position maps.
  • Figure 4: Canonical 3D Gaussians on side regions and hands.
  • Figure 5: Example animatable avatars with high-fidelity dynamic appearances created by our method.
  • ...and 12 more figures