Table of Contents
Fetching ...

Locally Adaptive Neural 3D Morphable Models

Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O'Sullivan, Stylianos Ploumpis, Stefanos Zafeiriou

TL;DR

LAMM addresses the challenge of fine-grained local control over dense 3D meshes by introducing Locally Adaptive Morphable Models (LAMM), an end-to-end autoencoder that uses region-based tokenization and per-region control networks to overwrite encoded geometry with sparse control-point displacements. The method achieves state-of-the-art disentanglement and reconstruction by avoiding latent-space partitioning and leveraging a global latent code plus region-specific processing, enabling dense, locally edited outputs with fast CPU inference on high-resolution meshes. Key contributions include a novel architecture with region tokens and displacement networks, a self-supervised training scheme that morphs from mean to target shapes across multiple layers, and editing primitives such as region swapping and sampling, demonstrated on 12k- and 72k-vertex head/hand datasets. The approach scales to large meshes with reduced memory requirements and enables practical, interactive mesh manipulation for applications in avatar creation, animation, and digital editors.

Abstract

We present the Locally Adaptive Morphable Model (LAMM), a highly flexible Auto-Encoder (AE) framework for learning to generate and manipulate 3D meshes. We train our architecture following a simple self-supervised training scheme in which input displacements over a set of sparse control vertices are used to overwrite the encoded geometry in order to transform one training sample into another. During inference, our model produces a dense output that adheres locally to the specified sparse geometry while maintaining the overall appearance of the encoded object. This approach results in state-of-the-art performance in both disentangling manipulated geometry and 3D mesh reconstruction. To the best of our knowledge LAMM is the first end-to-end framework that enables direct local control of 3D vertex geometry in a single forward pass. A very efficient computational graph allows our network to train with only a fraction of the memory required by previous methods and run faster during inference, generating 12k vertex meshes at $>$60fps on a single CPU thread. We further leverage local geometry control as a primitive for higher level editing operations and present a set of derivative capabilities such as swapping and sampling object parts. Code and pretrained models can be found at https://github.com/michaeltrs/LAMM.

Locally Adaptive Neural 3D Morphable Models

TL;DR

LAMM addresses the challenge of fine-grained local control over dense 3D meshes by introducing Locally Adaptive Morphable Models (LAMM), an end-to-end autoencoder that uses region-based tokenization and per-region control networks to overwrite encoded geometry with sparse control-point displacements. The method achieves state-of-the-art disentanglement and reconstruction by avoiding latent-space partitioning and leveraging a global latent code plus region-specific processing, enabling dense, locally edited outputs with fast CPU inference on high-resolution meshes. Key contributions include a novel architecture with region tokens and displacement networks, a self-supervised training scheme that morphs from mean to target shapes across multiple layers, and editing primitives such as region swapping and sampling, demonstrated on 12k- and 72k-vertex head/hand datasets. The approach scales to large meshes with reduced memory requirements and enables practical, interactive mesh manipulation for applications in avatar creation, animation, and digital editors.

Abstract

We present the Locally Adaptive Morphable Model (LAMM), a highly flexible Auto-Encoder (AE) framework for learning to generate and manipulate 3D meshes. We train our architecture following a simple self-supervised training scheme in which input displacements over a set of sparse control vertices are used to overwrite the encoded geometry in order to transform one training sample into another. During inference, our model produces a dense output that adheres locally to the specified sparse geometry while maintaining the overall appearance of the encoded object. This approach results in state-of-the-art performance in both disentangling manipulated geometry and 3D mesh reconstruction. To the best of our knowledge LAMM is the first end-to-end framework that enables direct local control of 3D vertex geometry in a single forward pass. A very efficient computational graph allows our network to train with only a fraction of the memory required by previous methods and run faster during inference, generating 12k vertex meshes at 60fps on a single CPU thread. We further leverage local geometry control as a primitive for higher level editing operations and present a set of derivative capabilities such as swapping and sampling object parts. Code and pretrained models can be found at https://github.com/michaeltrs/LAMM.
Paper Structure (10 sections, 2 equations, 9 figures, 2 tables)

This paper contains 10 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overview of the Locally Adaptive Morphable Model (LAMM) use during inference. (top) Our trained decoder $f_{dec}$ receives as inputs a latent code $\mathbf{z}$ and displacements $\delta \mathcal{V}_{\mathcal{C}}$ over a set of sparse control points (red vertices). Here displacements for control points in the nose region are shown with green arrows. The decoder generates the shape of the object encoded in $\mathbf{z}$, overwriting local geometry to respect $\delta \mathcal{V}_{\mathcal{C}}$. (bottom) We can sample latent codes to generate new instances of human heads in the (a) identity, (b) expression space. Similarly, we can either randomly sample or provide vertex displacements manually at control points to manipulate (c) local identity features (ears, nose, mouth shown here) and (d) add expressions while retaining identity features unchanged.
  • Figure 2: Architecture overview. A source mesh $\mathcal{V}^s$ is encoded into latent code $\mathbf{z}$ and decoded using additional displacements at control points $\delta\mathcal{V}_C$. (top) Tokenization and encoder modules. A 3D input mesh $\mathcal{V}^s$ is split into regions $\mathcal{V}^s_i = \mathcal{V}^s[\mathcal{R}_i]$, each region is tokenized via region-specific linear weights $\mathbf{W}_i^{in}$ and compressed by the encoder module into latent code $\mathbf{z}$. (bottom) Displacement control, decoder and inverse tokenization modules. User displacements at control points are split into corresponding regions $\mathcal{C}_i$, processed by region specific control networks $f_{\delta i}$ and used by the decoder to overwrite the geometry encoded in $\mathbf{z}$. The inverse tokenization module translates decoder outputs into region geometries, via region-specific linear weights $\mathbf{W}_i^{out}$, which are merged together, using $\mathcal{R}_i$, into the target estimate $\mathcal{\hat{V}}^t$.
  • Figure 3: Templates for UHM12k and Handy data. Colors indicate dense regions $\mathcal{R}_i$. Control points $\mathcal{C}_i$ are shown in red dots.
  • Figure 4: Qualitative assessment of trained AE in UHM12k. (top) New shape generation through sampling of the latent space $\mathbf{z}$, (bottom) Interpolation between latent codes for two samples from the evaluation data shows a smooth transition between identities.
  • Figure 5: Identity manipulation performance under direct vertex control for UHM12k (left) and Handy (right) data. We observe that errors are typically reduced close to control points. For both datasets our method learns to produce highly realistic output geometries.
  • ...and 4 more figures