Table of Contents
Fetching ...

Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

Guilherme Christmann, Ying-Sheng Luo, Wei-Chao Chen

TL;DR

The expert composer policy is proposed, a framework to reliably expand the skill repertoire of quadruped agents by linking pair of experts via transitions to a sampled target state, allowing experts to be composed sequentially.

Abstract

We propose the expert composer policy, a framework to reliably expand the skill repertoire of quadruped agents. The composer policy links pair of experts via transitions to a sampled target state, allowing experts to be composed sequentially. Each expert specializes in a single skill, such as a locomotion gait or a jumping motion. Instead of a hierarchical or mixture-of-experts architecture, we train a single composer policy in an independent process that is not conditioned on the other expert policies. By reusing the same composer policy, our approach enables adding new experts without affecting existing ones, enabling incremental repertoire expansion and preserving original motion quality. We measured the transition success rate of 72 transition pairs and achieved an average success rate of 99.99\%, which is over 10\% higher than the baseline random approach, and outperforms other state-of-the-art methods. Using domain randomization during training we ensure a successful transfer to the real world, where we achieve an average transition success rate of 97.22\% (N=360) in our experiments.

Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots

TL;DR

The expert composer policy is proposed, a framework to reliably expand the skill repertoire of quadruped agents by linking pair of experts via transitions to a sampled target state, allowing experts to be composed sequentially.

Abstract

We propose the expert composer policy, a framework to reliably expand the skill repertoire of quadruped agents. The composer policy links pair of experts via transitions to a sampled target state, allowing experts to be composed sequentially. Each expert specializes in a single skill, such as a locomotion gait or a jumping motion. Instead of a hierarchical or mixture-of-experts architecture, we train a single composer policy in an independent process that is not conditioned on the other expert policies. By reusing the same composer policy, our approach enables adding new experts without affecting existing ones, enabling incremental repertoire expansion and preserving original motion quality. We measured the transition success rate of 72 transition pairs and achieved an average success rate of 99.99\%, which is over 10\% higher than the baseline random approach, and outperforms other state-of-the-art methods. Using domain randomization during training we ensure a successful transfer to the real world, where we achieve an average transition success rate of 97.22\% (N=360) in our experiments.
Paper Structure (14 sections, 3 equations, 7 figures, 2 tables)

This paper contains 14 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Expert composer policy generates novel transitions between arbitrary agent states, enabling skill repertoire expansion while preserving the motion quality of the original experts.
  • Figure 2: Illustration of physical-property bounds on the agent's states.
  • Figure 3: A visualization of the composer policy with the real-world robot. Please watch the supplementary video for more extensive demonstrations.
  • Figure 4: Illustration of the composer policy $\mathbf{P}$ enables the sequencing of any expert policy from the library over time.
  • Figure 5: Success rates of the composition strategies for 72 transition pairs. (F) and (B) indicate forward and backward for Trot and Pace, (F)ast and (S)low for Side Step. Experts highlighted in blue were part of the training set (N=4), and purple indicates new experts added to the library after training (N=5). Our Composer Policy (R) and (O) variants outperforms the baselines and existing approaches in simulation by a significant margin. It is also successful in the real-world, with just 10 failures out of 360 trials.
  • ...and 2 more figures