Table of Contents
Fetching ...

ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling

Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng, Athanasios Papaioannou, Stefanos Zafeiriou

TL;DR

ImHead addresses the limitations of linear 3D Morphable Models by introducing a large-scale implicit head model capable of expressive full-head representation and localized editing. It achieves this with a global identity latent decomposed into region-specific embeddings (K=39) that condition separate local networks, fused to produce a coherent signed distance field, and a backward warping module that maps observations to a canonical space using the expression code. Trained on a curated dataset of roughly 4k identities and ~50k scans, imHead demonstrates stronger identity and expression reconstruction than prior implicit and explicit baselines, enables fully localized region edits and region swaps, and preserves dense correspondences. While offering clear advantages in realism and editability, the work also discusses limitations of implicit representations (hair detail, inference speed) and dataset biases, along with societal considerations for large-scale head modeling.

Abstract

Over the last years, 3D morphable models (3DMMs) have emerged as a state-of-the-art methodology for modeling and generating expressive 3D avatars. However, given their reliance on a strict topology, along with their linear nature, they struggle to represent complex full-head shapes. Following the advent of deep implicit functions, we propose imHead, a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features. Previous methods directly divided the latent space into local components accompanied by an identity encoding to capture the global shape variations, leading to expensive latent sizes. In contrast, we retain a single compact identity space and introduce an intermediate region-specific latent representation to enable local edits. To train imHead, we curate a large-scale dataset of 4K distinct identities, making a step-towards large scale 3D head modeling. Under a series of experiments we demonstrate the expressive power of the proposed model to represent diverse identities and expressions outperforming previous approaches. Additionally, the proposed approach provides an interpretable solution for 3D face manipulation, allowing the user to make localized edits.

ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling

TL;DR

ImHead addresses the limitations of linear 3D Morphable Models by introducing a large-scale implicit head model capable of expressive full-head representation and localized editing. It achieves this with a global identity latent decomposed into region-specific embeddings (K=39) that condition separate local networks, fused to produce a coherent signed distance field, and a backward warping module that maps observations to a canonical space using the expression code. Trained on a curated dataset of roughly 4k identities and ~50k scans, imHead demonstrates stronger identity and expression reconstruction than prior implicit and explicit baselines, enables fully localized region edits and region swaps, and preserves dense correspondences. While offering clear advantages in realism and editability, the work also discusses limitations of implicit representations (hair detail, inference speed) and dataset biases, along with societal considerations for large-scale head modeling.

Abstract

Over the last years, 3D morphable models (3DMMs) have emerged as a state-of-the-art methodology for modeling and generating expressive 3D avatars. However, given their reliance on a strict topology, along with their linear nature, they struggle to represent complex full-head shapes. Following the advent of deep implicit functions, we propose imHead, a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features. Previous methods directly divided the latent space into local components accompanied by an identity encoding to capture the global shape variations, leading to expensive latent sizes. In contrast, we retain a single compact identity space and introduce an intermediate region-specific latent representation to enable local edits. To train imHead, we curate a large-scale dataset of 4K distinct identities, making a step-towards large scale 3D head modeling. Under a series of experiments we demonstrate the expressive power of the proposed model to represent diverse identities and expressions outperforming previous approaches. Additionally, the proposed approach provides an interpretable solution for 3D face manipulation, allowing the user to make localized edits.

Paper Structure

This paper contains 22 sections, 18 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: We propose imHead, a large scale implicit 3D morphable model composed from 4,000 distinct identities under diverse expressions. imHead enables compact latent representations and localized editing.
  • Figure 2: Overview of the proposed imHead architecture: Given a point in the observation space $\bm{x}$ and an expression code $\bm{z}_{exp}$ the Expression Deformer network $\mathcal{E}_{\theta}$ predicts a displacement field $\Delta\bm{x}$ to warp the observations to the canonical space $\bm{x}_{can}$. To enable localized editing, DecNet$\mathcal{T}_{\theta}$ decomposes the global identity latent $\bm{z}_{id}$ into local embeddings $\{\bm{z}^{j}_{id}\}_{j=0}^K$ that correspond to distinct head regions. The local embeddings are used to condition a set of Local-Part$\mathcal{G}_\theta$ networks that predict localized features $\mathbf{f}_j$ for each point in the canonical space. To facilitate modeling, a landmark regressor LandmarkNet$\mathcal{K}_\theta$ predicts a set of head keypoints, providing a canonical frame of each local-part network. Finally, the local features are agrregated and fused by FusionNet$\mathcal{F}_\theta$ which regresses the signed distance field of point $\bm{x}$.
  • Figure 3: Specificity Error measures the realism of the generated faces under different standard deviation values.
  • Figure 4: Latent Space Interpolation. The proposed model can achieve smooth changes while interpolating the latent space between source and target identities.
  • Figure 5: Qualitative Reconstruction Evaluation of the proposed and the baseline methods under different expressions and identities. Reconstruction for each method is obtained using a fitting optimization from the input partial point clouds. We also report the reconstruction error, in terms of Chamfer distance, color-coded on top of the 3D reconstructions.
  • ...and 9 more figures