Table of Contents
Fetching ...

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa

TL;DR

The paper introduces Adversarial Motion Priors (AMP), a framework that learns a data-driven motion style from large, unstructured motion clips using adversarial imitation learning. By combining a goal-conditioned reinforcement learning objective with a discriminative motion prior, AMP enables physically simulated characters to perform complex tasks while adopting naturalistic, diverse behaviors without explicit clip selection or task-specific annotations. AMP scales to large datasets and automatically composes multiple skills, achieving high-fidelity motions comparable to tracking-based approaches while reducing manual reward engineering. Through ablation studies, the authors demonstrate the importance of a gradient penalty and velocity features for stable training and motion realism, and they validate the approach across humanoid and non-humanoid characters on varied tasks.

Abstract

Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips. Composition of disparate skills emerges automatically from the motion prior, without requiring a high-level motion planner or other task-specific annotations of the motion clips. We demonstrate the effectiveness of our framework on a diverse cast of complex simulated characters and a challenging suite of motor control tasks.

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

TL;DR

The paper introduces Adversarial Motion Priors (AMP), a framework that learns a data-driven motion style from large, unstructured motion clips using adversarial imitation learning. By combining a goal-conditioned reinforcement learning objective with a discriminative motion prior, AMP enables physically simulated characters to perform complex tasks while adopting naturalistic, diverse behaviors without explicit clip selection or task-specific annotations. AMP scales to large datasets and automatically composes multiple skills, achieving high-fidelity motions comparable to tracking-based approaches while reducing manual reward engineering. Through ablation studies, the authors demonstrate the importance of a gradient penalty and velocity features for stable training and motion realism, and they validate the approach across humanoid and non-humanoid characters on varied tasks.

Abstract

Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips. Composition of disparate skills emerges automatically from the motion prior, without requiring a high-level motion planner or other task-specific annotations of the motion clips. We demonstrate the effectiveness of our framework on a diverse cast of complex simulated characters and a challenging suite of motor control tasks.

Paper Structure

This paper contains 40 sections, 19 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Schematic overview of the system. Given a motion dataset defining a desired motion style for the character, the system trains a motion prior that specifies style-rewards $r_t^S$ for the policy during training. These style-rewards are combined with task-rewards $r_t^G$ and used to train a policy that enables a simulated character to satisfy task-specific goals ${\mathbf{g}}$, while also adopting behaviors that resemble the reference motions in the dataset.
  • Figure 2: The motion prior can be trained with large datasets of diverse motions, enabling simulated characters to perform complex tasks by composing a wider range of skills. Each environment is denoted by "Character: Task (Dataset)".
  • Figure 3: Performance of Target Heading policies trained with different datasets. Left: Learning curves comparing the normalized task returns of policies trained with a large dataset of diverse locomotion clips to policies trained with only walking or running reference motions. Three models are trained using each dataset. Right: Comparison of the target speed with the average speed achieved by the different policies. Policies trained using the larger Locomotion dataset is able to more closely follow the various target speeds by imitating different gaits.
  • Figure 4: Learning curves comparing the task performance of AMP to latent space models (Latent Space) and policies trained from scratch without motion data (No Data). Our method achieves comparable performance across the various tasks, while also producing higher fidelity motions.
  • Figure 5: Snapshots of behaviors learned by the Humanoid on the single-clip imitation tasks. Top-to-bottom: back-flip, side-flip, cartwheel, spin, spin-kick, roll. AMP enables the character to closely imitate a diverse corpus of highly dynamic and acrobatic skills.
  • ...and 4 more figures