Safe Learning of Locomotion Skills from MPC

Xun Pua; Majid Khadiv

Safe Learning of Locomotion Skills from MPC

Xun Pua, Majid Khadiv

TL;DR

This work tackles safe, data-efficient learning of quadruped locomotion by marrying MPC as an expert with iterative imitation learning. It introduces LocoDAGGER and, more importantly, LocoSafeDAGGER, which injects safety checks to minimize training-time failures without increasing expert queries. Through extensive simulation, the approach achieves rollout safety comparable to or better than SafeDAGGER while delivering robustness and velocity tracking on par with or exceeding behavioral cloning, and superior push-recovery performance. The findings highlight the practical benefit of safety-aware, MPC-guided learning for real-world legged robots and point to future extensions toward broader terrains and real hardware deployment.

Abstract

Safe learning of locomotion skills is still an open problem. Indeed, the intrinsically unstable nature of the open-loop dynamics of locomotion systems renders naive learning from scratch prone to catastrophic failures in the real world. In this work, we investigate the use of iterative algorithms to safely learn locomotion skills from model predictive control (MPC). In our framework, we use MPC as an expert and take inspiration from the safe data aggregation (SafeDAGGER) framework to minimize the number of failures during training of the policy. Through a comparison with other standard approaches such as behavior cloning and vanilla DAGGER, we show that not only our approach has a substantially fewer number of failures during training, but the resulting policy is also more robust to external disturbances.

Safe Learning of Locomotion Skills from MPC

TL;DR

Abstract

Paper Structure (20 sections, 6 figures, 1 table, 4 algorithms)

This paper contains 20 sections, 6 figures, 1 table, 4 algorithms.

INTRODUCTION
Background
Behavioral cloning (BC)
Data Aggregation (DAGGER)
SafeDAGGER
Guided policy search (GPS)
Methodology
LocoDAGGER
LocoSafeDAGGER
Implementation
Policy Network
Robot Initial Condition Generation
The Expert
Iterative Algorithm Parameters
Evaluation Procedure
...and 5 more sections

Figures (6)

Figure 1: Rollout failure rates during data collection. LocoSafeDAGGER maintains a consistently low failure rate ($<5\%$), while LocoDAGGER shows higher failure rates ($>5\%$) in later iterations as expert influence decreases.
Figure 2: NMPC usage during data collection. LocoSafeDAGGER maintains a consistent expert query rate of around 10%, while LocoDAGGER's rate decreases monotonically from 100%. LocoSafeDAGGER achieves lower computational demands without compromising data collection safety.
Figure 3: Policy robustness. All approaches except LocoDAGGER with high expert influence achieve a similar evaluation rollout success rate of about 90%, demonstrating that iterative algorithms can match the policy robustness of BC.
Figure 4: Velocity tracking MSE. BC demonstrates the best overall velocity tracking performance, with LocoSafeDAGGER almost matching BC's tracking performance. LocoDAGGER with high expert interference leads to overly conservative data collection rollouts, resulting in lower tracking performance.
Figure 5: Average maximum impulse $[Ns]$ tolerated from all directions with $\geq80\%$ successful recovery rate. Iterative algorithms' data collection strategies produce datasets more representative of the trained policy's encountered distribution during rollout, enhancing policy robustness against external disturbances.
...and 1 more figures

Safe Learning of Locomotion Skills from MPC

TL;DR

Abstract

Safe Learning of Locomotion Skills from MPC

Authors

TL;DR

Abstract

Table of Contents

Figures (6)