HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior
David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue
TL;DR
HAHA addresses monocular video avatar reconstruction by blending Gaussian splatting with a textured SMPL-X mesh prior to achieve high fidelity and efficient rendering. The method employs a three-stage training pipeline (full Gaussian avatar, textured mesh, and joint merging) with depth-conditioned transparency to prune Gaussians while preserving detail. Quantitative and qualitative results on X-Humans and SnapshotPeople show competitive or superior quality with significantly fewer Gaussians, especially improving hand articulation and novel-pose generalization. The work yields memory-efficient, scalable avatars suitable for real-time applications, by leveraging the strengths of both Gaussian representations and textured meshes.
Abstract
We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.
