Table of Contents
Fetching ...

Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design

Vassil Atanassov, Jiatao Ding, Jens Kober, Ioannis Havoutis, Cosimo Della Santina

TL;DR

This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design, and achieves a 90 cm forward jump, exceeding all previous records for similar robots.

Abstract

Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on pre-existing reference trajectories obtained by capturing animal motions or transferring experience from existing controllers. This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design. Starting from a vertical in-place jump, we generalize the learned policy to forward and diagonal jumps and, finally, we learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach yields a wide range of omnidirectional jumping motions in real-world experiments. Particularly we achieve a 90cm forward jump, exceeding all previous records for similar robots reported in the existing literature. Additionally, the robot can reliably execute continuous jumping on soft grassy grounds, which is especially remarkable as such conditions were not included in the training stage. A supplementary video can be found on: https://www.youtube.com/watch?v=nRaMCrwU5X8. The code associated with this work can be found on: https://github.com/Vassil17/Curriculum-Quadruped-Jumping-DRL.

Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design

TL;DR

This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design, and achieves a 90 cm forward jump, exceeding all previous records for similar robots.

Abstract

Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on pre-existing reference trajectories obtained by capturing animal motions or transferring experience from existing controllers. This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design. Starting from a vertical in-place jump, we generalize the learned policy to forward and diagonal jumps and, finally, we learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach yields a wide range of omnidirectional jumping motions in real-world experiments. Particularly we achieve a 90cm forward jump, exceeding all previous records for similar robots reported in the existing literature. Additionally, the robot can reliably execute continuous jumping on soft grassy grounds, which is especially remarkable as such conditions were not included in the training stage. A supplementary video can be found on: https://www.youtube.com/watch?v=nRaMCrwU5X8. The code associated with this work can be found on: https://github.com/Vassil17/Curriculum-Quadruped-Jumping-DRL.
Paper Structure (22 sections, 1 equation, 12 figures, 3 tables)

This paper contains 22 sections, 1 equation, 12 figures, 3 tables.

Figures (12)

  • Figure 1: The Go1 robot jumps across grassland (top), jumps down onto grassland (middle) and jumps across a gap onto a lower box (bottom).
  • Figure 2: The curricula: jumping in place (left), long-distance jump (middle) and long-distance jump with obstacles (right). The latter two vary the jump distance/orientation and obstacle height, respectively.
  • Figure 3: Control diagram of the system. The observations $\mathbf{o}_t$ include user command (in green) and a history of system states (in yellow). The policy is parameterised by a neural network (shown in blue). The output actions $\mathbf{a}_{t+1}$ are added to the nominal joint angles $\mathbf{q}^{\mathrm{nom}}$. The desired joint angles are then tracked via a PD controller which computes torque commands.
  • Figure 4: The definition of observations. The command $\mathbf{g}$ and jump toggle j are provided by the user, while the remaining observations are either directly read from the sensors, or estimated using sensory data.
  • Figure 5: The command vector $\mathbf{g}$ for a forward jump onto an obstacle. In the first two training stages ($\pi_I$ and $\pi_{II}$), where no obstacles are considered, the information of the obstacle is set to zero.
  • ...and 7 more figures