Table of Contents
Fetching ...

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero

TL;DR

This work proposes an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields, that achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time.

Abstract

This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. The body is represented by a set of gaussian primitives in a canonical space which is deformed with a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (HuGS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution).

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

TL;DR

This work proposes an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields, that achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time.

Abstract

This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. The body is represented by a set of gaussian primitives in a canonical space which is deformed with a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (HuGS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution).
Paper Structure (43 sections, 4 equations, 7 figures, 6 tables)

This paper contains 43 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: HuGS method overview. We represent the different attributes of Gaussians at each step of the deformation pipeline. Canonical positions and orientations are first deformed with LBS using the learned skinning weights. Then positions, orientations, and colors are refined by an MLP using the latent codes. Finally, Gaussians in the observation space are rendered through the target camera view.
  • Figure 2: Visualization of MLP outputs. From left to right: ground-truth image, rendered image, translation output $t_{\text{mlp}}$ norm (lightest colors indicate largest translation vector) and ambient occlusion factor $s$ (grey: no color modification, blue: darker colors, red: lighter color). We observe that our MLP learns to operate on the dynamic parts of garments.
  • Figure 3: Comparison with DVA on the DNA-Rendering dataset. Despite fast non-rigid motion of complex textured garments, our method preserves more details than DVA and is able to fit unusual topology with loose clothing.
  • Figure 4: Comparison with Neural Body on ZJU-MoCap dataset. While results in novel pose synthesis are comparable for both methods, HuGS generalizes way better to novel poses thanks to its forward deformation formulation.
  • Figure 5: Qualitative comparison between PoseVocab and HuGS on Thuman4 dataset. On the left, our avatar shows better fidelity w.r.t. the body pose than PoseVocab that fails to deform the hood and the knee at the correct location. In the middle, HuGS presents a more detailed logo and more accurate head pose. On the right, PoseVocab presents artefacts due to failures of the inverse skinning process.
  • ...and 2 more figures