Table of Contents
Fetching ...

ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction

Bo Qian, Zhenhuan Wei, Jiashuo Li, Xing Wei

TL;DR

ReliaAvatar tackles robust real-time full-body avatar pose estimation under low-quality signals by integrating an autoregressive motion-prediction pathway with a regression pathway. It introduces a dual-path architecture and a Joint-Relation Transformer to model inter-joint relationships, addressing standard, instantaneous, and prolonged data-loss scenarios. Empirical results on AMASS show state-of-the-art performance in standard conditions and strong robustness under data loss, with an inference speed of 109 fps. The approach reduces dependency on continuous tracker signals, enabling affordable, reliable avatar animation for AR/VR/MR applications and broader access to full-body avatar control.

Abstract

Efficiently estimating the full-body pose with minimal wearable devices presents a worthwhile research direction. Despite significant advancements in this field, most current research neglects to explore full-body avatar estimation under low-quality signal conditions, which is prevalent in practical usage. To bridge this gap, we summarize three scenarios that may be encountered in real-world applications: standard scenario, instantaneous data-loss scenario, and prolonged data-loss scenario, and propose a new evaluation benchmark. The solution we propose to address data-loss scenarios is integrating the full-body avatar pose estimation problem with motion prediction. Specifically, we present \textit{ReliaAvatar}, a real-time, \textbf{relia}ble \textbf{avatar} animator equipped with predictive modeling capabilities employing a dual-path architecture. ReliaAvatar operates effectively, with an impressive performance rate of 109 frames per second (fps). Extensive comparative evaluations on widely recognized benchmark datasets demonstrate Relia\-Avatar's superior performance in both standard and low data-quality conditions. The code is available at \url{https://github.com/MIV-XJTU/ReliaAvatar}.

ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction

TL;DR

ReliaAvatar tackles robust real-time full-body avatar pose estimation under low-quality signals by integrating an autoregressive motion-prediction pathway with a regression pathway. It introduces a dual-path architecture and a Joint-Relation Transformer to model inter-joint relationships, addressing standard, instantaneous, and prolonged data-loss scenarios. Empirical results on AMASS show state-of-the-art performance in standard conditions and strong robustness under data loss, with an inference speed of 109 fps. The approach reduces dependency on continuous tracker signals, enabling affordable, reliable avatar animation for AR/VR/MR applications and broader access to full-body avatar control.

Abstract

Efficiently estimating the full-body pose with minimal wearable devices presents a worthwhile research direction. Despite significant advancements in this field, most current research neglects to explore full-body avatar estimation under low-quality signal conditions, which is prevalent in practical usage. To bridge this gap, we summarize three scenarios that may be encountered in real-world applications: standard scenario, instantaneous data-loss scenario, and prolonged data-loss scenario, and propose a new evaluation benchmark. The solution we propose to address data-loss scenarios is integrating the full-body avatar pose estimation problem with motion prediction. Specifically, we present \textit{ReliaAvatar}, a real-time, \textbf{relia}ble \textbf{avatar} animator equipped with predictive modeling capabilities employing a dual-path architecture. ReliaAvatar operates effectively, with an impressive performance rate of 109 frames per second (fps). Extensive comparative evaluations on widely recognized benchmark datasets demonstrate Relia\-Avatar's superior performance in both standard and low data-quality conditions. The code is available at \url{https://github.com/MIV-XJTU/ReliaAvatar}.
Paper Structure (18 sections, 14 equations, 6 figures, 10 tables)

This paper contains 18 sections, 14 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Visualization in the context of prolonged data-loss scenario. We mask out the latter half of a sample, which consisted of 80 frames and depicted "crouching". The first row represents the ground truth, the second row represents the response of ReliaAvatar, and the third row represents the response of AvatarPoser. The visualization clearly indicates that ReliaAvatar can operate effectively in prolonged data loss scenarios with only minor distortions. In contrast, AvatarPoser completely fails to perform in this scenario.
  • Figure 2: Illustration of our dual-pathway, autoregressive framework. ReliaAvatar has two pathways: the regression pathway (Regression Encoder$\rightarrow$Joint-Relation Transformer$\rightarrow$Decoder) and the prediction pathway(Prediction Encoder$\rightarrow$Joint-Relation Transformer$\rightarrow$Decoder). The output of ReliaAvatar at time-step $t$ forms a part of the input at time-step $t+1$. The blocks with diagonal lines as foreground (,,) represent tokens that signify SMPL joints, e.g., pelvis, left wrist, right ankle.
  • Figure 4: Inference time comparison.
  • Figure : A) Instantaneous data-loss
  • Figure : A) Instantaneous data-loss
  • ...and 1 more figures