Contact-conditioned learning of multi-gait locomotion policies
Michal Ciebielski, Federico Burgio, Majid Khadiv
TL;DR
This work tackles multi-gait locomotion for legged robots by reframing policy learning as imitation from a contact-aware MPC expert. It introduces contact-switch based goal representations and shows, through extensive simulation on a biped (Bolt) and a quadruped (Go2), that conditioning on future contact switches yields superior performance and notably better out-of-distribution generalization compared with velocity- or gait-conditioned baselines. The method relies on behavior cloning from a contact-explicit MPC and evaluates several goal-encoding schemes, finding that contact-conditioned policies consistently outperform alternatives, especially when asked to follow unseen velocities or gaits. The results motivate a generalist, contact-driven learning framework that can pair with any contact planner and handle cyclic and acyclic motions, with potential extensions to manipulation and real-world deployment.
Abstract
In this paper, we examine the effects of goal representation on the performance and generalization in multi-gait policy learning settings for legged robots. To study this problem in isolation, we cast the policy learning problem as imitating model predictive controllers that can generate multiple gaits. We hypothesize that conditioning a learned policy on future contact switches is a suitable goal representation for learning a single policy that can generate a variety of gaits. Our rationale is that policies conditioned on contact information can leverage the shared structure between different gaits. Our extensive simulation results demonstrate the validity of our hypothesis for learning multiple gaits on a bipedal and a quadrupedal robot. Most interestingly, our results show that contact-conditioned policies generalize much better than other common goal representations in the literature, when the robot is tested outside the distribution of the training data.
