ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann
TL;DR
ASH tackles real-time photorealistic rendering of animatable clothed humans by representing the actor with a fixed set of 3D Gaussian splats attached to a deformable template mesh. Gaussian parameters are learned in 2D texture space via motion-aware texture decoders, enabling efficient image-space splatting under user-controlled skeletal motion. A two-stage training strategy, combining warmup with pseudo-ground-truth parameters and final pixel- and SSIM-based optimization, yields high-fidelity, motion-dependent appearances while maintaining real-time performance. Empirical results on multi-view datasets show ASH outperforms existing real-time methods by a large margin and closely matches or surpasses several offline approaches, highlighting its potential for interactive avatars in AR/VR and games. Overall, ASH reduces manual labors and provides scalable, controllable, photorealistic rendering of dynamic humans learned exclusively from multi-view videos.
Abstract
Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.
