Table of Contents
Fetching ...

CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning

Elliot Chane-Sane, Pierre-Alexandre Leziart, Thomas Flayols, Olivier Stasse, Philippe Souères, Nicolas Mansard

TL;DR

This work advocates for integrating constraints into robot learning and presents Constraints as Terminations (CaT), a novel constrained RL algorithm that leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption.

Abstract

Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.

CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning

TL;DR

This work advocates for integrating constraints into robot learning and presents Constraints as Terminations (CaT), a novel constrained RL algorithm that leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption.

Abstract

Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
Paper Structure (18 sections, 7 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 7 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: The open-hardware quadruped robot Solo-12 trained with CaT performing agile locomotion over challenging terrains while satisfying safety and style constraints. The robot can walk up stairs, traverse slopes, and climb over high obstacles.
  • Figure 2: (Left) The quadruped robot is trained with CaT in simulation using height-map scan. (Right) The learned policy is directly deployed on the real robot. Knowing the obstacle course on which the robot is placed, we use external motion capture cameras to reconstruct the height-map of its surroundings based on its position and orientation in the world.
  • Figure 3: Joint torques and velocities during the climb of a 24 cm platform. For clarity, we only report data for the knee joints, which had the highest torque peaks.
  • Figure 4: CaT trained with a constraint that limits the height of the base learns crouching locomotion skills.