LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo
TL;DR
LAM offers a one-shot solution for animatable Gaussian heads from a single image, producing a canonical-space Gaussian avatar that can be reenacted and rendered in real time without extra networks or post-processing. By initializing a dense, subdivided FLAME-based point cloud and applying cross-attention between learned point features and multi-scale image features, the method predicts per-point Gaussian attributes and offsets, enabling FLAME-compatible animation via standard LBS with corrective blendshapes. The framework delivers high-quality reconstruction, clean identity preservation, and fast runtime on diverse devices, including mobile via WebGL, and supports text-to-image and style-editing workflows for broad applicability. Key contributions include a Transformer-based canonical Gaussian attribute generator, FLAME-informed subdivision, and a pure 3D Gaussian avatar that integrates into conventional rendering pipelines. The results demonstrate superior performance on VFHQ and HDTF benchmarks, with flexible deployment and applications spanning generative editing and real-time reenactment.
Abstract
We present LAM, an innovative Large Avatar Model for animatable Gaussian head reconstruction from a single image. Unlike previous methods that require extensive training on captured video sequences or rely on auxiliary neural networks for animation and rendering during inference, our approach generates Gaussian heads that are immediately animatable and renderable. Specifically, LAM creates an animatable Gaussian head in a single forward pass, enabling reenactment and rendering without additional networks or post-processing steps. This capability allows for seamless integration into existing rendering pipelines, ensuring real-time animation and rendering across a wide range of platforms, including mobile phones. The centerpiece of our framework is the canonical Gaussian attributes generator, which utilizes FLAME canonical points as queries. These points interact with multi-scale image features through a Transformer to accurately predict Gaussian attributes in the canonical space. The reconstructed canonical Gaussian avatar can then be animated utilizing standard linear blend skinning (LBS) with corrective blendshapes as the FLAME model did and rendered in real-time on various platforms. Our experimental results demonstrate that LAM outperforms state-of-the-art methods on existing benchmarks. Our code and video are available at https://aigc3d.github.io/projects/LAM/
