GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal
TL;DR
GAvatar tackles the challenge of generating animatable 3D avatars from text by marrying Gaussian Splatting with a primitive-based, pose-aware framework and a dedicated SDF-based mesh learning pathway. The approach introduces a primitive-attached Gaussian representation and neural implicit fields to stabilize the optimization of millions of Gaussians under high-variance losses like SDS, while an SDF-based pipeline regularizes geometry and enables high-quality textured mesh extraction. Key contributions include the primitive-based implicit Gaussian avatar, the SDF-driven opacity and mesh extraction, and the demonstrated ability to render at 100 fps at 1K resolution with diverse prompts. This results in scalable, high-fidelity avatars suitable for immersive applications in AR/VR, gaming, and synthetic data generation.
Abstract
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.
