Table of Contents
Fetching ...

ShapeFusion: A 3D diffusion model for localized shape editing

Rolandos Alexandros Potamias, Michail Tarasiou, Stylianos Ploumpis, Stefanos Zafeiriou

TL;DR

ShapeFusion introduces a diffusion-based framework for localized editing of 3D meshes, addressing the entangled latent spaces of PCA-based parametric models that hinder region-specific edits. The method uses a masked forward diffusion on vertices and a geometry-aware denoising network with hierarchical mesh convolutions and vertex-index encodings to guarantee edits confined to user-specified regions via anchor points. It enables direct point manipulation, region sampling, global reconstruction, region swapping, and localized expression editing, outperforming state-of-the-art disentangled methods in diversity, localization, and inference speed. The approach provides a practical, interpretable tool for precise 3D avatar editing, with potential applications in digital artistry and aesthetic medicine.

Abstract

In the realm of 3D computer vision, parametric models have emerged as a ground-breaking methodology for the creation of realistic and expressive 3D avatars. Traditionally, they rely on Principal Component Analysis (PCA), given its ability to decompose data to an orthonormal space that maximally captures shape variations. However, due to the orthogonality constraints and the global nature of PCA's decomposition, these models struggle to perform localized and disentangled editing of 3D shapes, which severely affects their use in applications requiring fine control such as face sculpting. In this paper, we leverage diffusion models to enable diverse and fully localized edits on 3D meshes, while completely preserving the un-edited regions. We propose an effective diffusion masking training strategy that, by design, facilitates localized manipulation of any shape region, without being limited to predefined regions or to sparse sets of predefined control vertices. Following our framework, a user can explicitly set their manipulation region of choice and define an arbitrary set of vertices as handles to edit a 3D mesh. Compared to the current state-of-the-art our method leads to more interpretable shape manipulations than methods relying on latent code state, greater localization and generation diversity while offering faster inference than optimization based approaches. Project page: https://rolpotamias.github.io/Shapefusion/

ShapeFusion: A 3D diffusion model for localized shape editing

TL;DR

ShapeFusion introduces a diffusion-based framework for localized editing of 3D meshes, addressing the entangled latent spaces of PCA-based parametric models that hinder region-specific edits. The method uses a masked forward diffusion on vertices and a geometry-aware denoising network with hierarchical mesh convolutions and vertex-index encodings to guarantee edits confined to user-specified regions via anchor points. It enables direct point manipulation, region sampling, global reconstruction, region swapping, and localized expression editing, outperforming state-of-the-art disentangled methods in diversity, localization, and inference speed. The approach provides a practical, interpretable tool for precise 3D avatar editing, with potential applications in digital artistry and aesthetic medicine.

Abstract

In the realm of 3D computer vision, parametric models have emerged as a ground-breaking methodology for the creation of realistic and expressive 3D avatars. Traditionally, they rely on Principal Component Analysis (PCA), given its ability to decompose data to an orthonormal space that maximally captures shape variations. However, due to the orthogonality constraints and the global nature of PCA's decomposition, these models struggle to perform localized and disentangled editing of 3D shapes, which severely affects their use in applications requiring fine control such as face sculpting. In this paper, we leverage diffusion models to enable diverse and fully localized edits on 3D meshes, while completely preserving the un-edited regions. We propose an effective diffusion masking training strategy that, by design, facilitates localized manipulation of any shape region, without being limited to predefined regions or to sparse sets of predefined control vertices. Following our framework, a user can explicitly set their manipulation region of choice and define an arbitrary set of vertices as handles to edit a 3D mesh. Compared to the current state-of-the-art our method leads to more interpretable shape manipulations than methods relying on latent code state, greater localization and generation diversity while offering faster inference than optimization based approaches. Project page: https://rolpotamias.github.io/Shapefusion/
Paper Structure (12 sections, 4 equations, 11 figures, 1 table)

This paper contains 12 sections, 4 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Illustration of the properties of the proposed method for localized editing (Top) and region sampling (Bottom). Top: The proposed method can manipulate any region of a mesh by simply setting a user-defined anchor point and its surrounding region. The manipulations are completely disentangled and affect only the selected region. The disentanglement of the manipulations is illustrated using the color-coded distances from the previous manipulation step. Bottom: The proposed method can also sample new face parts and expressions by simply defining a mask over the desired region.
  • Figure 2: Method overview: We propose a 3D diffusion model for localized attribute manipulation and editing. During forward diffusion step, noise is gradually added to random regions of the mesh, indicated by a mask $\mathbf{M}$. In the denoising step, a hierarchical network based on mesh convolution is used to learn a prior distribution of each attribute directly on the vertex space.
  • Figure 3: The proposed hierarchical message passing layer. At each layer the features are aggregated recursively from the coarser to the finer levels. Using such masking approach we can guarantee localized edits from the design of the method.
  • Figure 4: Qualitative and quantitative comparison between the proposed and the baseline methods. On the left and right sides we show the input meshes from MimicMe and UHM dataset respectively, with the manipulated region highlighted in green. In each of the rows we illustrate 5 samples generated from each method for the same region along with a heatmap indicating the differences with the original input. Please note that the proposed method achieves bigger displacements, which translates to more diverse samples, localized only on the manipulated region. Figure better viewed in zoom. For additional region manipulations we refer the reader to the supplementary material.
  • Figure 5: Qualitative and quantitative comparison between the proposed and the baseline methods on the STAR dataset. On the left sides we show the input meshes, with the manipulated region highlighted green. The region samples along with their heatmap are illustrated row-wise. Figure better viewed in zoom.
  • ...and 6 more figures