Table of Contents
Fetching ...

Disentangled Human Body Representation Based on Unsupervised Semantic-Aware Learning

Lu Wang, Xishuai Peng, S. Kevin Zhou

TL;DR

The paper targets accurate 3D human body reconstruction with controllable semantics under unsupervised learning. It introduces Disentangled Human Body Representation (DHBR) with a skeleton-grouped, whole-aware encoder and a part-aware decoder, anchored to a body template and refined by residual latent learning. Through geometric reconstruction loss and unsupervised cross-consistency and self-consistency losses, the model achieves high-fidelity meshes while disentangling identity and pose factors across 24 bone groups. Empirical results on SPRING and DFAUST show state-of-the-art reconstruction accuracy with a compact parameter footprint, and the framework supports pose transfer and bilinear interpolation, highlighting practical value for graphics and vision tasks.

Abstract

In recent years, more and more attention has been paid to the learning of 3D human representation. However, the complexity of lots of hand-defined human body constraints and the absence of supervision data limit that the existing works controllably and accurately represent the human body in views of semantics and representation ability. In this paper, we propose a human body representation with controllable fine-grained semantics and high precison of reconstruction in an unsupervised learning framework. In particularly, we design a whole-aware skeleton-grouped disentangle strategy to learn a correspondence between geometric semantical measurement of body and latent codes, which facilitates the control of shape and posture of human body by modifying latent coding paramerers. With the help of skeleton-grouped whole-aware encoder and unsupervised disentanglement losses, our representation model is learned by an unsupervised manner. Besides, a based-template residual learning scheme is injected into the encoder to ease of learning human body latent parameter in complicated body shape and pose spaces. Because of the geometrically meaningful latent codes, it can be used in a wide range of applications, from human body pose transfer to bilinear latent code interpolation. Further more, a part-aware decoder is utlized to promote the learning of controllable fine-grained semantics. The experimental results on public 3D human datasets show that the method has the ability of precise reconstruction.

Disentangled Human Body Representation Based on Unsupervised Semantic-Aware Learning

TL;DR

The paper targets accurate 3D human body reconstruction with controllable semantics under unsupervised learning. It introduces Disentangled Human Body Representation (DHBR) with a skeleton-grouped, whole-aware encoder and a part-aware decoder, anchored to a body template and refined by residual latent learning. Through geometric reconstruction loss and unsupervised cross-consistency and self-consistency losses, the model achieves high-fidelity meshes while disentangling identity and pose factors across 24 bone groups. Empirical results on SPRING and DFAUST show state-of-the-art reconstruction accuracy with a compact parameter footprint, and the framework supports pose transfer and bilinear interpolation, highlighting practical value for graphics and vision tasks.

Abstract

In recent years, more and more attention has been paid to the learning of 3D human representation. However, the complexity of lots of hand-defined human body constraints and the absence of supervision data limit that the existing works controllably and accurately represent the human body in views of semantics and representation ability. In this paper, we propose a human body representation with controllable fine-grained semantics and high precison of reconstruction in an unsupervised learning framework. In particularly, we design a whole-aware skeleton-grouped disentangle strategy to learn a correspondence between geometric semantical measurement of body and latent codes, which facilitates the control of shape and posture of human body by modifying latent coding paramerers. With the help of skeleton-grouped whole-aware encoder and unsupervised disentanglement losses, our representation model is learned by an unsupervised manner. Besides, a based-template residual learning scheme is injected into the encoder to ease of learning human body latent parameter in complicated body shape and pose spaces. Because of the geometrically meaningful latent codes, it can be used in a wide range of applications, from human body pose transfer to bilinear latent code interpolation. Further more, a part-aware decoder is utlized to promote the learning of controllable fine-grained semantics. The experimental results on public 3D human datasets show that the method has the ability of precise reconstruction.

Paper Structure

This paper contains 21 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The whole-aware skeleton-grouped disentangle strategy: (a) human body template with anatomical components, (b) human body bones and joints, (c) overview of disentangle strategy.
  • Figure 2: The architecture of our proposed embedding learning network.
  • Figure 3: The overview of unsupervised disentanglement losses: (a) the cross-consistency loss, where the shape code of one mesh is utilized to reconstruct itself after a cycle of decoding-encoding process, (b) the self-consistency loss, where the pose code of one mesh is utilized to reconstruct itself after an another cycle of decoding-encoding process.
  • Figure 4: Qualitative reconstruction performance on SPRING yang2014semantic and DFAUST bogo2017dynamic. The per-vertex Euclidean distance error is encoded as color on the reconstructed meshes. Our human model is trained without any data constraint, which is common in the compared approaches.
  • Figure 5: Pose transfer examples from pose source to shape source. In each example, the third new body mesh is decoded from the shape code given by the first mesh and the pose code given by the second mesh.
  • ...and 1 more figures