Table of Contents
Fetching ...

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue

TL;DR

HAHA addresses monocular video avatar reconstruction by blending Gaussian splatting with a textured SMPL-X mesh prior to achieve high fidelity and efficient rendering. The method employs a three-stage training pipeline (full Gaussian avatar, textured mesh, and joint merging) with depth-conditioned transparency to prune Gaussians while preserving detail. Quantitative and qualitative results on X-Humans and SnapshotPeople show competitive or superior quality with significantly fewer Gaussians, especially improving hand articulation and novel-pose generalization. The work yields memory-efficient, scalable avatars suitable for real-time applications, by leveraging the strengths of both Gaussian representations and textured meshes.

Abstract

We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

TL;DR

HAHA addresses monocular video avatar reconstruction by blending Gaussian splatting with a textured SMPL-X mesh prior to achieve high fidelity and efficient rendering. The method employs a three-stage training pipeline (full Gaussian avatar, textured mesh, and joint merging) with depth-conditioned transparency to prune Gaussians while preserving detail. Quantitative and qualitative results on X-Humans and SnapshotPeople show competitive or superior quality with significantly fewer Gaussians, especially improving hand articulation and novel-pose generalization. The work yields memory-efficient, scalable avatars suitable for real-time applications, by leveraging the strengths of both Gaussian representations and textured meshes.

Abstract

We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.
Paper Structure (18 sections, 8 equations, 12 figures, 5 tables)

This paper contains 18 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Optimizing the number of Gaussians. HAHA jointly optimizes a Gaussian splatting model with a textured mesh to improve the photometric quality of the avatars. The method filters out superfluous Gaussians in a learnable, unsupervised manner. As a result, we can more efficiently and better animate highly articulated parts of a body.
  • Figure 2: Scheme of our approach. a) We attach Gaussians to mesh polygons as described in Section \ref{['sec:splatting']} and rasterize them conditioned on depth map $\mathcal{D}$ into RGB image $\mathcal{G}$ and alpha map $\mathcal{A}$. b) We train RGB texture for SMPL-X and rasterize mesh to RGB image $\mathcal{M}$ and depth map $\mathcal{D}$. c) During training and inference we merge rasterizations of Gaussians $\mathcal{G}$ and mesh $\mathcal{M}$, based on the trainable transparency map $\mathcal{A}$ of Gaussians.
  • Figure 3: Stages of training. a) SMPL-X with optimizable RGB texture fitted on input video frames. b) 3DGS trained as described in Section \ref{['sec:splatting']}. c) All unnecessary Gaussians are deleted (Section \ref{['sec:merging']}) to merge this step with (a) and get (d).
  • Figure 4: Reconstruction for test frames from SnapshotPeople dataset (female-3-casual). Our method demonstrates the same subjective quality of reconstruction as state-of-the-art qian20233dgslei2023garthu2023gaussianavatar while using fewer Gaussians to represent an avatar. For some sequences, GaussianAvatar hu2023gaussianavatar tends to include the white background color used in the training while the overall quality of the method is high.
  • Figure 5: Comparison on X-Humans dataset. We provide results for three different poses and views to demonstrate hands animation. HAHA allows us to animate hands while we use much fewer Gaussians, and it is more robust to the input data while producing fewer artifacts. While GaussianAvatar hu2023gaussianavatar also benefits from using SMPL-X to animate hands, HAHA produces more realistic-looking results.
  • ...and 7 more figures