Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
Pol Caselles, Eduard Ramon, Jaime Garcia, Gil Triginer, Francesc Moreno-Noguer
TL;DR
This work tackles few-shot full-head reconstruction by introducing a Surface Appearance Statistical Model (SA-SM) that encodes shape and appearance priors, and by modeling geometry as a deformation of a reference Signed Distance Function (SDF) within an implicit differentiable rendering framework. The proposed SIRA++ system combines a pre-trained SA-SM with a two-stage optimization (latent priors first, then deformation/renderer fine-tuning) to achieve accurate, detailed reconstructions from 1–3 images, while significantly reducing runtime via parallel ray tracing and caching. The authors expand the H3DS dataset to 60 high-resolution full-head scans for rigorous evaluation and demonstrate state-of-the-art geometry reconstruction, robustness to camera noise, and substantial speedups (roughly $80\%$ faster) over prior methods. This approach enables reliable, high-fidelity full-head avatars from minimal input, with broad impact for VR/AR, CG, and identity-preserving digital humans, and provides a valuable dataset resource for further research.
Abstract
Recent advancements in learning techniques that employ coordinate-based neural representations have yielded remarkable results in multi-view 3D reconstruction tasks. However, these approaches often require a substantial number of input views (typically several tens) and computationally intensive optimization procedures to achieve their effectiveness. In this paper, we address these limitations specifically for the problem of few-shot full 3D head reconstruction. We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations, enabling faster convergence and improved generalization when working with only a few input images (even as low as a single image). During testing, we leverage this prior to guide the fitting process of a signed distance function using a differentiable renderer. By incorporating the statistical prior alongside parallelizable ray tracing and dynamic caching strategies, we achieve an efficient and accurate approach to few-shot full 3D head reconstruction. Moreover, we extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks, which we use for evaluation purposes. By leveraging this dataset, we demonstrate the remarkable capabilities of our approach in achieving state-of-the-art results in geometry reconstruction while being an order of magnitude faster than previous approaches.
