InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering
Antonio Canela, Pol Caselles, Ibrar Malik, Eduard Ramon, Jaime García, Jordi Sánchez-Riera, Gil Triginer, Francesc Moreno-Noguer
TL;DR
InstantAvatar tackles the slow per-scene optimization of neural-field head reconstructions by introducing a grid-based SDF prior learned from thousands of head shapes and leveraging differentiable surface rendering. A multi-resolution feature grid reduces decoder size and enables fast SDF queries, while a monocular normals cue stabilizes optimization and guides high-frequency detail capture. The approach achieves reconstructive accuracy competitive with state-of-the-art methods but with about a 100× speed-up, enabling near real-time full-head avatars from single or few images. This practical acceleration broadens the applicability of high-fidelity head reconstruction in AR/VR and related applications, without sacrificing qualitative richness in hair, shoulders, and accessories.
Abstract
Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
