Learning Quadruped Locomotion Using Differentiable Simulation

Yunlong Song; Sangbae Kim; Davide Scaramuzza

Learning Quadruped Locomotion Using Differentiable Simulation

Yunlong Song, Sangbae Kim, Davide Scaramuzza

TL;DR

This work proposes a new differentiable simulation framework that represents one of the first successful applications of differentiable simulation to real-world quadruped locomotion, offering a compelling alternative to traditional RL methods.

Abstract

This work explores the potential of using differentiable simulation for learning quadruped locomotion. Differentiable simulation promises fast convergence and stable training by computing low-variance first-order gradients using robot dynamics. However, its usage for legged robots is still limited to simulation. The main challenge lies in the complex optimization landscape of robotic tasks due to discontinuous dynamics. This work proposes a new differentiable simulation framework to overcome these challenges. Our approach combines a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. This approach maintains simulation accuracy by aligning the robot states from the surrogate model with those of the precise, non-differentiable simulator. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. We demonstrate that differentiable simulation outperforms a reinforcement learning algorithm (PPO) by achieving significantly better sample efficiency while maintaining its effectiveness in handling large-scale environments. Our method represents one of the first successful applications of differentiable simulation to real-world quadruped locomotion, offering a compelling alternative to traditional RL methods.

Learning Quadruped Locomotion Using Differentiable Simulation

TL;DR

Abstract

Paper Structure (22 sections, 11 equations, 7 figures, 4 tables)

This paper contains 22 sections, 11 equations, 7 figures, 4 tables.

Introduction
Methodology
Problem Formulation
Forward Simulation
Backpropagation Through Time
State Alignment with A Non-Differentiable Simulator
Short-Horizon Policy Training
Differentiable Loss Function
Experimental Results
A Toy Example: Control of A Double Integrator
Learning to Walk with One Robot
Learning Diverse Walking Skills on Challenging Terrains
Related Work
Limitations
Conclusion
...and 7 more sections

Figures (7)

Figure 1: Graphical model for policy learning using differentiable simulation.
Figure 2: System overview of learning quadruped locomotion using differentiable simulation. Our approach decouples the robot dynamics into two separate spaces: joint and floating base spaces. We leverage the differentiability and smoothness of a single rigid-body dynamics for the robot's main body, which takes the ground reaction force from its legs as the control inputs. We use the state from a non-differentiable simulator (IsaacGym) to align the state in the differentiable simulation.
Figure 3: Control of a double integrator using optimal control, reinforcement learning, and differentiable simulation. (left): Learning curves. (middle): Trajectories of different control policies. We initialize the system at the same states for all methods. (right): DiffSim achieves control commands close to optimal control.
Figure 4: Learning to walk with one simulated robot. We run 10 experiments with different random seeds. The plot is smoothed using a moving average.
Figure 5: Learning to walk on challenging terrains, reinforcement learning versus differentiable simulation. Differentiable simulation exhibits significant advantages over PPO in terms of sample efficiency and learning stability. After training in simulation, the policy can be transferred to the real world without fine-tuning.
...and 2 more figures

Learning Quadruped Locomotion Using Differentiable Simulation

TL;DR

Abstract

Learning Quadruped Locomotion Using Differentiable Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)