Table of Contents
Fetching ...

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng

TL;DR

Dynamic 3D human avatars often fail to reproduce inertia-driven appearance changes when conditioned only on a single pose. The authors propose Dyco, which conditions non-rigid deformations and canonical radiance fields on a pose sequence, specifically the delta-pose sequence $S_{\bm{\Delta{\bm{p}}}}$ together with the current pose $\bm{p}_i$, using a localized dynamic context encoder to mitigate overfitting. They introduce a tri-plane canonical space for efficient representation and present the I3D-Human dataset to study inertia-related clothing variations, along with a dynamic motion error (DME) metric based on optical flow to evaluate motion fidelity. Across I3D-Human and ZJU-MoCap, Dyco achieves state-of-the-art or competitive results and demonstrates the ability to simulate appearance changes caused by inertia at different velocities, indicating practical impact for realistic animated avatars.

Abstract

Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances. In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

TL;DR

Dynamic 3D human avatars often fail to reproduce inertia-driven appearance changes when conditioned only on a single pose. The authors propose Dyco, which conditions non-rigid deformations and canonical radiance fields on a pose sequence, specifically the delta-pose sequence together with the current pose , using a localized dynamic context encoder to mitigate overfitting. They introduce a tri-plane canonical space for efficient representation and present the I3D-Human dataset to study inertia-related clothing variations, along with a dynamic motion error (DME) metric based on optical flow to evaluate motion fidelity. Across I3D-Human and ZJU-MoCap, Dyco achieves state-of-the-art or competitive results and demonstrates the ability to simulate appearance changes caused by inertia at different velocities, indicating practical impact for realistic animated avatars.

Abstract

Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances. In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
Paper Structure (30 sections, 14 equations, 14 figures, 6 tables)

This paper contains 30 sections, 14 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: We emphasize that appearance variations not only depend on different static poses, but can also be induced by inertia, such as the graceful hanging down of the dress drape after a sudden stop in motion. Compared with previous methods solely relying on static poses, we encode past pose trajectory with the pose sequence to accurately capture such dynamic effects. This improves both novel-view renderings and generalization to novel poses.
  • Figure 2: The overall pipeline of our method. The rigid transformation and non-rigid transformation module deform the coordinate in the pose space into the canonical space, which is then fed into the triplane volume to obtain the color and density in the canonical space. To capture the variation under similar poses within different dynamic contexts, we adopt a localized dynamic context encoder to embed pose sequences as additional conditional inputs into the transformation module and canonical volume.
  • Figure 3: Qualitative comparison on the novel view of I3D-Human dataset. Our method can render distinct appearance and deformation across various motion contexts. In contrast, HumanNeRF weng2022humannerf and 3DGS-Avatar qian2023_3dgsavatar struggle to capture the precise details. The other three baselines peng2024animatablepeng2023implicit exhibit noticeable artifacts.
  • Figure 4: Qualitative comparison on the novel pose of I3D-Human dataset.
  • Figure 5: The ZJU-MoCap dataset also exhibits ambiguous mapping from the static pose to various appearances. At two frames when the subject has similar poses, the clothing wrinkles can differ due to distinct past motion. Our method can reflect the variation, while HumanNeRF weng2022humannerf generates similar patterns.
  • ...and 9 more figures