Table of Contents
Fetching ...

Integrating Controllable Motion Skills from Demonstrations

Honghao Liao, Zhiheng Li, Ziyu Meng, Ran Song, Yibin Li, Wei Zhang

TL;DR

CSI tackles the problem of flexibly integrating multiple motion skills into a single legged-robot controller without extensive reward engineering, and extends capability with language-directed control via a high-level NLI module. It presents a GAILfO-based learning framework augmented with Conditional Imitation Learning, a latent conditioning from skill labels, and a Condition-Aware Discriminator to ensure controllability across many skills. The method yields a scalable, demonstration-driven pipeline enabling smooth skill transitions and zero-shot language guidance, demonstrated on both humanoid and quadruped platforms. This approach reduces design complexity, expands the skill repertoire accessible to legged robots, and enables practical, language-guided control for real-world tasks.

Abstract

The expanding applications of legged robots require their mastery of versatile motion skills. Correspondingly, researchers must address the challenge of integrating multiple diverse motion skills into controllers. While existing reinforcement learning (RL)-based approaches have achieved notable success in multi-skill integration for legged robots, these methods often require intricate reward engineering or are restricted to integrating a predefined set of motion skills constrained by specific task objectives, resulting in limited flexibility. In this work, we introduce a flexible multi-skill integration framework named Controllable Skills Integration (CSI). CSI enables the integration of a diverse set of motion skills with varying styles into a single policy without the need for complex reward tuning. Furthermore, in a hierarchical control manner, the trained low-level policy can be coupled with a high-level Natural Language Inference (NLI) module to enable preliminary language-directed skill control. Our experiments demonstrate that CSI can flexibly integrate a diverse array of motion skills more comprehensively and facilitate the transitions between different skills. Additionally, CSI exhibits good scalability as the number of motion skills to be integrated increases significantly.

Integrating Controllable Motion Skills from Demonstrations

TL;DR

CSI tackles the problem of flexibly integrating multiple motion skills into a single legged-robot controller without extensive reward engineering, and extends capability with language-directed control via a high-level NLI module. It presents a GAILfO-based learning framework augmented with Conditional Imitation Learning, a latent conditioning from skill labels, and a Condition-Aware Discriminator to ensure controllability across many skills. The method yields a scalable, demonstration-driven pipeline enabling smooth skill transitions and zero-shot language guidance, demonstrated on both humanoid and quadruped platforms. This approach reduces design complexity, expands the skill repertoire accessible to legged robots, and enables practical, language-guided control for real-world tasks.

Abstract

The expanding applications of legged robots require their mastery of versatile motion skills. Correspondingly, researchers must address the challenge of integrating multiple diverse motion skills into controllers. While existing reinforcement learning (RL)-based approaches have achieved notable success in multi-skill integration for legged robots, these methods often require intricate reward engineering or are restricted to integrating a predefined set of motion skills constrained by specific task objectives, resulting in limited flexibility. In this work, we introduce a flexible multi-skill integration framework named Controllable Skills Integration (CSI). CSI enables the integration of a diverse set of motion skills with varying styles into a single policy without the need for complex reward tuning. Furthermore, in a hierarchical control manner, the trained low-level policy can be coupled with a high-level Natural Language Inference (NLI) module to enable preliminary language-directed skill control. Our experiments demonstrate that CSI can flexibly integrate a diverse array of motion skills more comprehensively and facilitate the transitions between different skills. Additionally, CSI exhibits good scalability as the number of motion skills to be integrated increases significantly.
Paper Structure (21 sections, 14 equations, 7 figures, 5 tables)

This paper contains 21 sections, 14 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Our method enables legged robots to flexibly integrate a range of different motion skills into a single controller, and can be further combined with a high-level NLI module to enable preliminary language-directed skill control. The language commands used here are (1) "Act as if you're a scary character", (2) "Return to normal walking style", (3) "Show me your jumping skills".
  • Figure 2: Overview diagram of CSI. Through retargeting and skill labeling, a set of reference motion clips with corresponding labels can be obtained. During training, sampled motion skill labels $c_i$ are mapped to latent vectors $z$ through an encoder network, and the policy generates corresponding motion skills based on $z$. The discriminator is responsible for indirectly regulating the motions generated by the policy in a way that provides style rewards. After the training stage, a controller with integrated multiple motion skills is available. These integrated skills can be controlled directly through user commands or externally via a high-level pre-trained NLI module for language-directed skill control.
  • Figure 3: Frequency distribution obtained by matching the controller-generated motion skills with the reference dataset under different tasks. Compared to the baselines (b) Baseline-I, (c) CAMP and (d) CALM, (a) CSI exhibits a more even distribution of motion skill coverage across different tasks.
  • Figure 4: Probability distributions of different motion skill transitions, where each row represents the probability of transferring from a source motion to each destination motion. Compared to the baselines (b) Baseline-I, (c) CAMP and (d) CALM, (a) CSI captures a more balanced distribution of motion skill transitions.
  • Figure 5: Snapshots of the motion sequences generated by CSI, where the sub-skills (first row) and corresponding text commands (second row) are also provided. Our CSI enables language-directed skill control and transition by combining with a high-level NLI module.
  • ...and 2 more figures