Contact-conditioned learning of multi-gait locomotion policies

Michal Ciebielski; Federico Burgio; Majid Khadiv

Contact-conditioned learning of multi-gait locomotion policies

Michal Ciebielski, Federico Burgio, Majid Khadiv

TL;DR

This work tackles multi-gait locomotion for legged robots by reframing policy learning as imitation from a contact-aware MPC expert. It introduces contact-switch based goal representations and shows, through extensive simulation on a biped (Bolt) and a quadruped (Go2), that conditioning on future contact switches yields superior performance and notably better out-of-distribution generalization compared with velocity- or gait-conditioned baselines. The method relies on behavior cloning from a contact-explicit MPC and evaluates several goal-encoding schemes, finding that contact-conditioned policies consistently outperform alternatives, especially when asked to follow unseen velocities or gaits. The results motivate a generalist, contact-driven learning framework that can pair with any contact planner and handle cyclic and acyclic motions, with potential extensions to manipulation and real-world deployment.

Abstract

In this paper, we examine the effects of goal representation on the performance and generalization in multi-gait policy learning settings for legged robots. To study this problem in isolation, we cast the policy learning problem as imitating model predictive controllers that can generate multiple gaits. We hypothesize that conditioning a learned policy on future contact switches is a suitable goal representation for learning a single policy that can generate a variety of gaits. Our rationale is that policies conditioned on contact information can leverage the shared structure between different gaits. Our extensive simulation results demonstrate the validity of our hypothesis for learning multiple gaits on a bipedal and a quadrupedal robot. Most interestingly, our results show that contact-conditioned policies generalize much better than other common goal representations in the literature, when the robot is tested outside the distribution of the training data.

Contact-conditioned learning of multi-gait locomotion policies

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 7 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
Contact Explicit MPC
Generating Diverse Initial Conditions
Behavior Cloning from MPC
Method
Goal Representation
Implementation
Expert controllers
Data Collection
Policy Parametrization
Results
Evaluation Metrics
Conclusion

Figures (6)

Figure 1: Example contact sequence captured on the biped Bolt. The left panel represents the current state of the robot and the following three panels depict two contact switches.
Figure 2: Policy structure for the velocity-conditioned policy and the contact-conditioned policy.
Figure 3: Biped in-distribution evaluations. Top: scaled error \ref{['metricp']}, bottom: time survived in evaluation rollout, max 10 seconds. As we can see in the plots, Con. policy has substantially smaller error and larger survival time compared to the other representations.
Figure 4: Biped out-of-distribution evaluations. The robot was commanded to walk at 0.5 m/s in a dense grid of planar directions, while it was trained only on forward and backward motions. The rays represent the velocity aligned with the lateral plane of the robot. The green color denotes the nominal commanded lateral velocity. The con. policy clearly outperforms other representations in tracking the nominal behavior in all directions.
Figure 5: Quadruped velocity tracking evaluations. Policies were tested on a dense grid of x and y velocities for jump (top) and trot (bottom). Inside the red box is the velocity range present in the training data. Top(blue-green): scaled error \ref{['metricp']}. Bottom(yellow-purple): time survived in evaluation rollout, max 10 seconds. While all policies perform similarly inside the range of velocities they trained on, contact-conditioned policy outperforms others for the out-of-distribution velocities.
...and 1 more figures

Contact-conditioned learning of multi-gait locomotion policies

TL;DR

Abstract

Contact-conditioned learning of multi-gait locomotion policies

Authors

TL;DR

Abstract

Table of Contents

Figures (6)