Table of Contents
Fetching ...

Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets

Jiaxin Tu, Xiaoyi Wei, Yueqi Zhang, Taixian Hou, Xiaofei Gao, Zhiyan Dong, Peng Zhai, Lihua Zhang

TL;DR

The paper tackles the challenge of learning diverse quadruped skills and smooth transitions without relying on complete expert datasets. It introduces PASIST, a framework that uses introspective learning with Generative Adversarial Self-Imitation Learning to autonomously discover high-quality trajectories guided by target poses and a DTW-based trajectory quality metric. A skill selector mitigates mode collapse and balances learning across skills, enabling smooth transitions between behaviors. Experiments on simulation and a real Solo 8 robot demonstrate effective multi-skill acquisition and zero-shot sim-to-real transfer, offering an efficient alternative to expert-driven imitation learning.

Abstract

Learning diverse skills for quadruped robots presents significant challenges, such as mastering complex transitions between different skills and handling tasks of varying difficulty. Existing imitation learning methods, while successful, rely on expensive datasets to reproduce expert behaviors. Inspired by introspective learning, we propose Progressive Adversarial Self-Imitation Skill Transition (PASIST), a novel method that eliminates the need for complete expert datasets. PASIST autonomously explores and selects high-quality trajectories based on predefined target poses instead of demonstrations, leveraging the Generative Adversarial Self-Imitation Learning (GASIL) framework. To further enhance learning, We develop a skill selection module to mitigate mode collapse by balancing the weights of skills with varying levels of difficulty. Through these methods, PASIST is able to reproduce skills corresponding to the target pose while achieving smooth and natural transitions between them. Evaluations on both simulation platforms and the Solo 8 robot confirm the effectiveness of PASIST, offering an efficient alternative to expert-driven learning.

Continuous Control of Diverse Skills in Quadruped Robots Without Complete Expert Datasets

TL;DR

The paper tackles the challenge of learning diverse quadruped skills and smooth transitions without relying on complete expert datasets. It introduces PASIST, a framework that uses introspective learning with Generative Adversarial Self-Imitation Learning to autonomously discover high-quality trajectories guided by target poses and a DTW-based trajectory quality metric. A skill selector mitigates mode collapse and balances learning across skills, enabling smooth transitions between behaviors. Experiments on simulation and a real Solo 8 robot demonstrate effective multi-skill acquisition and zero-shot sim-to-real transfer, offering an efficient alternative to expert-driven imitation learning.

Abstract

Learning diverse skills for quadruped robots presents significant challenges, such as mastering complex transitions between different skills and handling tasks of varying difficulty. Existing imitation learning methods, while successful, rely on expensive datasets to reproduce expert behaviors. Inspired by introspective learning, we propose Progressive Adversarial Self-Imitation Skill Transition (PASIST), a novel method that eliminates the need for complete expert datasets. PASIST autonomously explores and selects high-quality trajectories based on predefined target poses instead of demonstrations, leveraging the Generative Adversarial Self-Imitation Learning (GASIL) framework. To further enhance learning, We develop a skill selection module to mitigate mode collapse by balancing the weights of skills with varying levels of difficulty. Through these methods, PASIST is able to reproduce skills corresponding to the target pose while achieving smooth and natural transitions between them. Evaluations on both simulation platforms and the Solo 8 robot confirm the effectiveness of PASIST, offering an efficient alternative to expert-driven learning.

Paper Structure

This paper contains 18 sections, 9 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: System overview. PASIST selects high-quality trajectories from policy exploration using the Trajectory Selector and performs imitation learning with the SIL discriminator. The SIL reward, combined with task and regularization rewards, is used to train the policy. A skill selector prevents overtraining on easier sub-policies, facilitating natural skill transitions without the need for expert datasets.
  • Figure 2: Sequences of Solo 8 robot navigating real-world obstacles by flexibly switching between learned skills.
  • Figure 3: t-SNE results of the state space (left) and action space (right) extracted from 500 steps of state and action data.
  • Figure 4: Comparison of the four skills learned through PASIST with their respective target poses. Subfigures a), b), c), and d) correspond to walk, crawl, stilt, and bipedalize, respectively.
  • Figure 5: The variation curves for the total reward $r$ during training.
  • ...and 2 more figures