Table of Contents
Fetching ...

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui

TL;DR

GeneAvatar proposes a universal, expression-aware editing framework for 3DMM-driven head avatars across multiple volumetric representations by lifting 2D edits into 3D modification fields. A dual-branch modification generator (geometry and texture) is trained via distillation from large-scale 3DGANs and 2D texture editors, with implicit latent-space guidance and a segmentation-based texture loss to stabilize training and improve fine-grained texture edits. Single-image editing is achieved through an auto-decoding optimization in the StyleGAN latent space, enabling consistent edits across expressions and viewpoints when applied to different avatar representations. The approach demonstrates strong cross-representation editing quality, view consistency, and identity preservation, while noting limitations in adding new accessories or hair and pointing to future work on extending geometric and hair generation capabilities.

Abstract

Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://zju3dv.github.io/geneavatar/

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

TL;DR

GeneAvatar proposes a universal, expression-aware editing framework for 3DMM-driven head avatars across multiple volumetric representations by lifting 2D edits into 3D modification fields. A dual-branch modification generator (geometry and texture) is trained via distillation from large-scale 3DGANs and 2D texture editors, with implicit latent-space guidance and a segmentation-based texture loss to stabilize training and improve fine-grained texture edits. Single-image editing is achieved through an auto-decoding optimization in the StyleGAN latent space, enabling consistent edits across expressions and viewpoints when applied to different avatar representations. The approach demonstrates strong cross-representation editing quality, view consistency, and identity preservation, while noting limitations in adding new accessories or hair and pointing to future work on extending geometric and hair generation capabilities.

Abstract

Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://zju3dv.github.io/geneavatar/
Paper Structure (28 sections, 6 equations, 19 figures, 4 tables)

This paper contains 28 sections, 6 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: We propose a generic approach to edit 3D avatars in various volumetric representations (NeRFBlendShape nerfblendshape, INSTA insta, Next3D sun2023next3d) from a single perspective using 2D editing methods with drag-style, text-prompt and pattern painting. Our editing results are consistent across multiple facial expression and camera viewpoints.
  • Figure 2: We use an expression-aware generative model that accepts a modification latent code $\mathbf{z}_{g/t}$ and 3DMM coefficients and outputs a modification field of a tri-plane structure. The modification field modifies the geometry and texture of the template avatar by deforming the sample points $\mathbf{x}$ and blending the color $\mathbf{c}_{o}$ with the modification color $\mathbf{c}_{\Delta}$ respectively. We lift the 2D editing effect to 3D using an auto-decoding optimization and synthesize novel views across different expression.
  • Figure 3: We compare geometry editing with PVP lin2023pvp, Roop roop, Next3D sun2023next3d on INSTA insta and NeRFBlendshape nerfblendshape avatars. The "Reference Animation" denotes the image of the original avatar under the same expression with the rendered edited view.
  • Figure 4: Our geometry editing results with the drag-style 2D editing on INSTA insta, NeRFBlendshape nerfblendshape, and Next3D sun2023next3d avatars.
  • Figure 5: We compare texture editing with PVP lin2023pvp, Roop roop, Next3D sun2023next3d on INSTA insta and NeRFBlendshape nerfblendshape avatars. The "Reference Animation" denotes the image of the original avatar under the same expression with the rendered edited view.
  • ...and 14 more figures