SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
TL;DR
SplattingAvatar tackles real-time photorealistic humanoid avatars from monocular video by coupling explicit mesh motion with implicit Gaussian Splatting rendered on a triangle mesh. It introduces a trainable embedding $E=(k,u,v,d)$ that defines Gaussian positions with mean $μ = P + d oldsymbol{n}$ on the mesh, and uses lifted optimization to jointly refine Gaussian parameters and embeddings as the mesh deforms. The method achieves real-time performance (over $300$ FPS on a RTX 3090 and ~ $30$ FPS on an iPhone 13) with state-of-the-art rendering quality on head and full-body datasets, particularly excelling in eyes, hair, and off-surface geometry. It emphasizes editability, compatibility with common animation pipelines, and portability, while acknowledging limitations related to clothes and hair disentanglement and proposing future work in richer mesh representations.
Abstract
We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.
