Table of Contents
Fetching ...

SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience

Elliot Chane-Sane, Joseph Amigo, Thomas Flayols, Ludovic Righetti, Nicolas Mansard

TL;DR

SoloParkour addresses the challenge of learning agile, safe legged locomotion from depth vision by reframing parkour as a constrained reinforcement learning problem. The method uses a two-stage pipeline: first, a privileged policy learns with easy-to-compute terrain information under CaT constraints; second, privileged demonstrations warm-start an off-policy, end-to-end visual policy trained from depth images, enabling efficient learning from pixels. The approach achieves strong sim-to-real performance on the Solo-12, including climbing a 1.5x height step and leaping over substantial gaps while maintaining constraint satisfaction. This combination of constrained RL and privileged-data bootstrapping reduces sample complexity and enhances safety, offering a practical pathway to agile, vision-based locomotion in real robots.

Abstract

Parkour poses a significant challenge for legged robots, requiring navigation through complex environments with agility and precision based on limited sensory inputs. In this work, we introduce a novel method for training end-to-end visual policies, from depth pixels to robot control commands, to achieve agile and safe quadruped locomotion. We formulate robot parkour as a constrained reinforcement learning (RL) problem designed to maximize the emergence of agile skills within the robot's physical limits while ensuring safety. We first train a policy without vision using privileged information about the robot's surroundings. We then generate experience from this privileged policy to warm-start a sample efficient off-policy RL algorithm from depth images. This allows the robot to adapt behaviors from this privileged experience to visual locomotion while circumventing the high computational costs of RL directly from pixels. We demonstrate the effectiveness of our method on a real Solo-12 robot, showcasing its capability to perform a variety of parkour skills such as walking, climbing, leaping, and crawling.

SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience

TL;DR

SoloParkour addresses the challenge of learning agile, safe legged locomotion from depth vision by reframing parkour as a constrained reinforcement learning problem. The method uses a two-stage pipeline: first, a privileged policy learns with easy-to-compute terrain information under CaT constraints; second, privileged demonstrations warm-start an off-policy, end-to-end visual policy trained from depth images, enabling efficient learning from pixels. The approach achieves strong sim-to-real performance on the Solo-12, including climbing a 1.5x height step and leaping over substantial gaps while maintaining constraint satisfaction. This combination of constrained RL and privileged-data bootstrapping reduces sample complexity and enhances safety, offering a practical pathway to agile, vision-based locomotion in real robots.

Abstract

Parkour poses a significant challenge for legged robots, requiring navigation through complex environments with agility and precision based on limited sensory inputs. In this work, we introduce a novel method for training end-to-end visual policies, from depth pixels to robot control commands, to achieve agile and safe quadruped locomotion. We formulate robot parkour as a constrained reinforcement learning (RL) problem designed to maximize the emergence of agile skills within the robot's physical limits while ensuring safety. We first train a policy without vision using privileged information about the robot's surroundings. We then generate experience from this privileged policy to warm-start a sample efficient off-policy RL algorithm from depth images. This allows the robot to adapt behaviors from this privileged experience to visual locomotion while circumventing the high computational costs of RL directly from pixels. We demonstrate the effectiveness of our method on a real Solo-12 robot, showcasing its capability to perform a variety of parkour skills such as walking, climbing, leaping, and crawling.
Paper Structure (34 sections, 6 equations, 7 figures, 2 tables)

This paper contains 34 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The open-hardware quadruped robot Solo-12 performs agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles.
  • Figure 2: Terrains used to train SoloParkour in simulation: the crawl parkour contains floating objects the robot must crawl under, the step and hurdle parkour contain obstacles for the robot to climb up and down, and the leap parkour contains gaps over which the robot must leap.
  • Figure 3: SoloParkour leverages a two-stage RL approach to train visual locomotion policies in simulation. Stage 1: we train a privileged policy that observes a heightmap scan of its surroundings and the height of the nearby floating objects using PPO with Constraints as Terminations (CaT) chane2024cat. Stage 2: we train a policy from depth pixels using a variant of DDPG with CaT that learns from a dataset of privileged experience collected using the Stage 1 policy.
  • Figure 4: Average terrain completion (over 4 training seeds) by obstacle dimension for each policy.
  • Figure 5: Constraint violations (in %) when the policies successfully traverse the terrain level by obstacle dimension, averaged across 4 training seeds.
  • ...and 2 more figures