WANDR: Intention-guided Human Motion Generation

Markos Diomataris; Nikos Athanasiou; Omid Taheri; Xi Wang; Otmar Hilliges; Michael J. Black

WANDR: Intention-guided Human Motion Generation

Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black

TL;DR

WANDR tackles the challenge of synthesizing natural 3D human motion that reaches arbitrary goals from a given initial pose by introducing intention features that guide a frame-by-frame autoregressive c-VAE. The model is trained on two complementary datasets, AMASS and CIRCLE, with pseudo-goals derived from future wrist positions to enable learning from unlabeled data and goal-directed data alike. Key contributions include the intention mechanism, a data-fusion training approach, and a motion generator capable of long-horizon, goal-directed motion with zero-shot generalization to unseen goals. Quantitative and qualitative results demonstrate improved goal-reaching accuracy and motion realism, and the authors release code to support further research in goal-conditioned motion synthesis.

Abstract

Synthesizing natural human motions that enable a 3D human avatar to walk and reach for arbitrary goals in 3D space remains an unsolved problem with many applications. Existing methods (data-driven or using reinforcement learning) are limited in terms of generalization and motion naturalness. A primary obstacle is the scarcity of training data that combines locomotion with goal reaching. To address this, we introduce WANDR, a data-driven model that takes an avatar's initial pose and a goal's 3D position and generates natural human motions that place the end effector (wrist) on the goal location. To solve this, we introduce novel intention features that drive rich goal-oriented movement. Intention guides the agent to the goal, and interactively adapts the generation to novel situations without needing to define sub-goals or the entire motion path. Crucially, intention allows training on datasets that have goal-oriented motions as well as those that do not. WANDR is a conditional Variational Auto-Encoder (c-VAE), which we train using the AMASS and CIRCLE datasets. We evaluate our method extensively and demonstrate its ability to generate natural and long-term motions that reach 3D goals and generalize to unseen goal locations. Our models and code are available for research purposes at wandr.is.tue.mpg.de.

WANDR: Intention-guided Human Motion Generation

TL;DR

Abstract

Paper Structure (29 sections, 9 equations, 9 figures, 2 tables)

This paper contains 29 sections, 9 equations, 9 figures, 2 tables.

Introduction
Related Work
Reinforcement Learning for Motion Synthesis
Data-driven Approaches for Motion Synthesis
Method
Two Complementing Datasets
Intention Features
Motion Network (WANDR)
Training Losses
Motion Generation
Experiments
Datasets & Processing
Evaluation Strategy
Evaluation Metrics
Results
...and 14 more sections

Figures (9)

Figure 1: WANDR starts from an arbitrary body pose and generates precise and realistic human motions that reach a specified 3D goal (depicted as a red sphere). Employing a purely data-driven approach, WANDR is a conditional Variational Autoencoder guided by intention features (depicted arrows) that steer the human's orientation (yellow), position (cyan) and wrist (pink) towards the goal. WANDR is able to reach a wide range of goals even if they deviate significantly from the training data.
Figure 2: WANDR architecture. During training, our model conditions on the intention vectors $I^p, I^r$ and $I^w$, learning to associate them with actions that result into reaching goals realistically. When the training data has no defined goal, we create a goal based on the wrist location in future frames; see Sec. \ref{['sec:intention']}. The state of the avatar, $p^{dyn}_i$ expresses the SMPL-X local pose parameters $p_{i}$, as well as the deltas $d_{i-1}$ the body parameters have in frame $i-1$. During inference, WANDR takes the intention features, the state, and random noise and returns the change in pose, $\hat{d}_i$. The next pose, $\hat{p}_i$ is obtained by integrating the $\hat{d}_i$ with the previous pose $\hat{p}_{i-1}$.
Figure 3: In training, if goals are not specified, they are determined by the future wrist location at a randomly selected future timestep, compensating for the lack of paired ground-truth data in AMASS and direct human motion through intention vectors. During inference, target locations are used as goals with intention vectors calculated based on these specific locations.
Figure 4: Diverse motion generated with WANDR: Displaying a range of motions generated by WANDR from various initial poses towards arbitrary goals. Examples include navigating towards goals from initial orientations not facing the goal (a, b, c, d), elevating the right hand to reach higher targets (c), and bending down to access goals near the floor (d), showcasing the model's ability to adapt to novel goal locations.
Figure 5: We show the success rates of reaching goals at various heights, angles, and distances from the initial human pose. It highlights how goal position affects the model in accurately navigating and achieving the goals.
...and 4 more figures

WANDR: Intention-guided Human Motion Generation

TL;DR

Abstract

WANDR: Intention-guided Human Motion Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)