Table of Contents
Fetching ...

Efficient Reinforcement Learning for Jumping Monopods

Riccardo Bussola, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, Luigi Palopoli

TL;DR

This work tackles the challenge of omni-directional jumping for a monopod on uneven terrain by injecting physical knowledge into reinforcement learning. It constrains the action space to a 5D Cartesian-parametrised thrust plan expressed as a 3rd-order Bezier curve, learned via the TD3 algorithm and mapped to joints through inverse kinematics, with a gravity-compensated low-level controller executing the motion. The learning is guided by a physics-informed reward that penalizes constraint violations and rewards landing accuracy, enabling real-time-like performance and compensation for tracking errors. Compared to nonlinear trajectory optimization and end-to-end RL, the proposed approach yields larger feasible regions, faster training, and comparable or superior front-jump accuracy, while drastically reducing online computation and enabling generalisation to unseen targets.

Abstract

In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.

Efficient Reinforcement Learning for Jumping Monopods

TL;DR

This work tackles the challenge of omni-directional jumping for a monopod on uneven terrain by injecting physical knowledge into reinforcement learning. It constrains the action space to a 5D Cartesian-parametrised thrust plan expressed as a 3rd-order Bezier curve, learned via the TD3 algorithm and mapped to joints through inverse kinematics, with a gravity-compensated low-level controller executing the motion. The learning is guided by a physics-informed reward that penalizes constraint violations and rewards landing accuracy, enabling real-time-like performance and compensation for tracking errors. Compared to nonlinear trajectory optimization and end-to-end RL, the proposed approach yields larger feasible regions, faster training, and comparable or superior front-jump accuracy, while drastically reducing online computation and enabling generalisation to unseen targets.

Abstract

In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.
Paper Structure (16 sections, 10 equations, 4 figures, 1 table)

This paper contains 16 sections, 10 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Diagram of the Framework. The framework is split into two levels: the RL agents and the planner. The RL agent produces an action for the planner, based on a desired target. This computes a Bezier reference curve that is mapped into joint motion via inverse kinematics and tracked by the controller that provides the joint torques to feed the robot. During the training, at the end of each episode a reward is computed and fed back to the agent.
  • Figure 2: Action parametrization and its bounds. On the left the top view, on the right the side view of the jumping plane.
  • Figure 3: Top-view of the feasibility region: (1-5) for different number of episodes of the training phase (the number of reachable points is computed for each $X$,$Y$ pair) and (right) in the case of the baseline .
  • Figure 4: Plot of the average as a function of the number of training episodes.