Table of Contents
Fetching ...

Topology-aware Human Avatars with Semantically-guided Gaussian Splatting

Haoyu Zhao, Chen Yang, Hao Wang, Xingyue Zhao, Wei Shen

TL;DR

This work proposes SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic human avatars, and designs a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling.

Abstract

Reconstructing photo-realistic and topology-aware animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the explicit topological and intrinsic structure within human body, they fail to achieve fine-detail reconstruction of human avatars. To address this issue, we propose SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic human avatars. We then design a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling. The generated labels are used to guide the optimization of semantic attributes of Gaussian. To capture the explicit topological structure of the human body, we employ a 3D network that integrates both topological and geometric associations for human avatar deformation. We further implement three key strategies to enhance the semantic accuracy of 3D Gaussians and rendering quality: semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency. Extensive experiments demonstrate that SG-GS achieves state-of-the-art geometry and appearance reconstruction performance.

Topology-aware Human Avatars with Semantically-guided Gaussian Splatting

TL;DR

This work proposes SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic human avatars, and designs a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling.

Abstract

Reconstructing photo-realistic and topology-aware animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the explicit topological and intrinsic structure within human body, they fail to achieve fine-detail reconstruction of human avatars. To address this issue, we propose SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic human avatars. We then design a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling. The generated labels are used to guide the optimization of semantic attributes of Gaussian. To capture the explicit topological structure of the human body, we employ a 3D network that integrates both topological and geometric associations for human avatar deformation. We further implement three key strategies to enhance the semantic accuracy of 3D Gaussians and rendering quality: semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency. Extensive experiments demonstrate that SG-GS achieves state-of-the-art geometry and appearance reconstruction performance.
Paper Structure (15 sections, 14 equations, 7 figures, 3 tables)

This paper contains 15 sections, 14 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We propose an efficient method for creating topology-aware human avatars from just videos, ensuring both photo-realistic human appearance and accurate anatomical structure. Our method achieve better quality to the most recent state-of-the-art methods wen2024gomavatarhu2024gauhumanqian20243dgs.
  • Figure 2: Our framework for creating photo-realistic animatable avatars from monocular videos. We initialize a set of 3D Gaussians in the canonical space by sampling 6,890 points from the SMPL model and assign the semantic attributes of Gaussians to each point. We first integrate a skeleton-driven rigid deformation and a non-rigid cloth dynamics deformation to deform human avatars from canonical space $\mathcal{G}_c$ to observation space $\mathcal{G}_o$. Then, we introduce a Semantic Human-Body Annotator (SHA), which leverages SMPL’s human body semantic prior for efficient semantic labeling. These labels are used to guide the optimization of 3D Gaussian’s semantic attribute $\mathcal{O}$. We also propose a 3D topology and geometry-aware network to learn body topological and geometric associations and integrate them into learning the 3D deformation. To enhance semantic accuracy and render quality, we implement semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency.
  • Figure 3: Qualitative Comparison on ZJU-MoCap peng2020neural. We show that our SG-GS can produce realistic details in both rendered images and geometry, while other approaches struggle to generate smooth details.
  • Figure 4: Qualitative Comparison on H36M ionescu2013human3. By utilizing semantic information within human body, our SG-GS preserves better anatomical structures of the human body, producing high-quality results
  • Figure 5: Ablation Study on Geometric and Semantic Feature Learning, which helps erase artifacts and learn fine details like cloth wrinkles and human face under novel views.
  • ...and 2 more figures