Table of Contents
Fetching ...

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, Xiaosheng He, Zhongang Cai, Lei Yang, Si Wu, Guosheng Lin

TL;DR

AttriHuman-3D tackles editable 3D-aware human avatar generation by introducing a space-attribute decomposition (six feature planes) and an implicit indexing mechanism to isolate attributes. A 4D space-attribute field is decomposed into six planes and paired with an implicit index predictor and orthogonal regularization to achieve strong disentanglement, enabling precise, attribute-level editing within a canonical space and SMPL-based deformation. The method additionally employs a hyper-latent training strategy and attribute-specific sampling to reduce style entanglement, resulting in high-quality view-consistent avatars and effective interactive editing. Experiments on fashion datasets show competitive rendering quality and clear advantages in editing fidelity and efficiency, demonstrating practical applicability for content creation, games, and AR/VR.

Abstract

Editable 3D-aware generation, which supports user-interacted editing, has witnessed rapid development recently. However, existing editable 3D GANs either fail to achieve high-accuracy local editing or suffer from huge computational costs. We propose AttriHuman-3D, an editable 3D human generation model, which address the aforementioned problems with attribute decomposition and indexing. The core idea of the proposed model is to generate all attributes (e.g. human body, hair, clothes and so on) in an overall attribute space with six feature planes, which are then decomposed and manipulated with different attribute indexes. To precisely extract features of different attributes from the generated feature planes, we propose a novel attribute indexing method as well as an orthogonal projection regularization to enhance the disentanglement. We also introduce a hyper-latent training strategy and an attribute-specific sampling strategy to avoid style entanglement and misleading punishment from the discriminator. Our method allows users to interactively edit selected attributes in the generated 3D human avatars while keeping others fixed. Both qualitative and quantitative experiments demonstrate that our model provides a strong disentanglement between different attributes, allows fine-grained image editing and generates high-quality 3D human avatars.

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

TL;DR

AttriHuman-3D tackles editable 3D-aware human avatar generation by introducing a space-attribute decomposition (six feature planes) and an implicit indexing mechanism to isolate attributes. A 4D space-attribute field is decomposed into six planes and paired with an implicit index predictor and orthogonal regularization to achieve strong disentanglement, enabling precise, attribute-level editing within a canonical space and SMPL-based deformation. The method additionally employs a hyper-latent training strategy and attribute-specific sampling to reduce style entanglement, resulting in high-quality view-consistent avatars and effective interactive editing. Experiments on fashion datasets show competitive rendering quality and clear advantages in editing fidelity and efficiency, demonstrating practical applicability for content creation, games, and AR/VR.

Abstract

Editable 3D-aware generation, which supports user-interacted editing, has witnessed rapid development recently. However, existing editable 3D GANs either fail to achieve high-accuracy local editing or suffer from huge computational costs. We propose AttriHuman-3D, an editable 3D human generation model, which address the aforementioned problems with attribute decomposition and indexing. The core idea of the proposed model is to generate all attributes (e.g. human body, hair, clothes and so on) in an overall attribute space with six feature planes, which are then decomposed and manipulated with different attribute indexes. To precisely extract features of different attributes from the generated feature planes, we propose a novel attribute indexing method as well as an orthogonal projection regularization to enhance the disentanglement. We also introduce a hyper-latent training strategy and an attribute-specific sampling strategy to avoid style entanglement and misleading punishment from the discriminator. Our method allows users to interactively edit selected attributes in the generated 3D human avatars while keeping others fixed. Both qualitative and quantitative experiments demonstrate that our model provides a strong disentanglement between different attributes, allows fine-grained image editing and generates high-quality 3D human avatars.
Paper Structure (14 sections, 7 equations, 8 figures, 2 tables)

This paper contains 14 sections, 7 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Our AttriHuman-3D achieves strong disentanglement between different attributes, generates high-quality view-consistent 3D human avatars which allows fine-grained editing. From left to right, we show the generation results by editing different attributes, try-on results by modifying the selected attribute sets and generation results of view-consistent images.
  • Figure 2: The overall framework of our model. We generate the decomposed feature plane with StyleGANv2-based generator and predict the indexes of each attribute with implicit indexing module. We model the deformation between canonical space and target space with deformer module and synthesis final image with compositional volume rendering and super-resolution module. Detailed structure of the attribute decompose module and deformer module is shown at the bottom, where $B_{s}(\beta)$, $B_{P}(\theta)$ represents the SMPL parameters randomly sampled from the dataset, n denotes the total number of selected attributes.
  • Figure 3: Qualitative Comparisons of our methods with EG3D, StyleSDF, EVA3D, CNeRF. The RGB images, generated segmentation masks and 3D meshes demonstrated that our method achieves high-quality human avatar generation. Moreover, the main contribution of our model is to support interactive user editing, which is not supported by EG3D, StyleSDF and EVA3D.
  • Figure 4: Qualitative comparisons of the editing results between our methods and CNeRF. From left to right we show the editing RGB and residual of changing the Top, Pants and Haircut. Benefit from our hyper-latent and attribute-specific training strategy, our model achieves better disentanglement and more precise control over the target semantic region compared to CNeRF.
  • Figure 5: Ablation of the implicit indexing module. Figure (a) and (b) compare the ablation results of our implicit mapping module with fixed identical mapping as dynamic NeRFs cao2023hexplane. Figure (c) shows cosine similarity of the predicted indexes with or without the proposed orthogonal projection regularization.
  • ...and 3 more figures