ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling
Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng, Athanasios Papaioannou, Stefanos Zafeiriou
TL;DR
ImHead addresses the limitations of linear 3D Morphable Models by introducing a large-scale implicit head model capable of expressive full-head representation and localized editing. It achieves this with a global identity latent decomposed into region-specific embeddings (K=39) that condition separate local networks, fused to produce a coherent signed distance field, and a backward warping module that maps observations to a canonical space using the expression code. Trained on a curated dataset of roughly 4k identities and ~50k scans, imHead demonstrates stronger identity and expression reconstruction than prior implicit and explicit baselines, enables fully localized region edits and region swaps, and preserves dense correspondences. While offering clear advantages in realism and editability, the work also discusses limitations of implicit representations (hair detail, inference speed) and dataset biases, along with societal considerations for large-scale head modeling.
Abstract
Over the last years, 3D morphable models (3DMMs) have emerged as a state-of-the-art methodology for modeling and generating expressive 3D avatars. However, given their reliance on a strict topology, along with their linear nature, they struggle to represent complex full-head shapes. Following the advent of deep implicit functions, we propose imHead, a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features. Previous methods directly divided the latent space into local components accompanied by an identity encoding to capture the global shape variations, leading to expensive latent sizes. In contrast, we retain a single compact identity space and introduce an intermediate region-specific latent representation to enable local edits. To train imHead, we curate a large-scale dataset of 4K distinct identities, making a step-towards large scale 3D head modeling. Under a series of experiments we demonstrate the expressive power of the proposed model to represent diverse identities and expressions outperforming previous approaches. Additionally, the proposed approach provides an interpretable solution for 3D face manipulation, allowing the user to make localized edits.
