Table of Contents
Fetching ...

Learning control of underactuated double pendulum with Model-Based Reinforcement Learning

Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres

TL;DR

This work applies a model-based reinforcement learning framework, MC-PILCO, to control two underactuated 2-DOF pendulum systems (Pendubot and Acrobot) for swing-up and stabilization. By learning a GP-based dynamics model and optimizing policies via simulated rollout with a Monte Carlo approach, the method achieves data-efficient control at $50$ Hz, including a prior-mean dynamics term derived from forward dynamics. The study demonstrates swing-up performance comparable to prior results while noting limitations in energy- and torque-smoothness penalties and sensitivity to parameter changes, yet emphasizing the practical value of rapid retraining on hardware and robustness to actuation perturbations. The findings support MC-PILCO as a viable MB-RL strategy for underactuated robotics, particularly when data is scarce and retraining is feasible.

Abstract

This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.

Learning control of underactuated double pendulum with Model-Based Reinforcement Learning

TL;DR

This work applies a model-based reinforcement learning framework, MC-PILCO, to control two underactuated 2-DOF pendulum systems (Pendubot and Acrobot) for swing-up and stabilization. By learning a GP-based dynamics model and optimizing policies via simulated rollout with a Monte Carlo approach, the method achieves data-efficient control at Hz, including a prior-mean dynamics term derived from forward dynamics. The study demonstrates swing-up performance comparable to prior results while noting limitations in energy- and torque-smoothness penalties and sensitivity to parameter changes, yet emphasizing the practical value of rapid retraining on hardware and robustness to actuation perturbations. The findings support MC-PILCO as a viable MB-RL strategy for underactuated robotics, particularly when data is scarce and retraining is feasible.

Abstract

This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.
Paper Structure (11 sections, 12 equations, 3 figures, 2 tables)

This paper contains 11 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Simulation of the Pendubot system (500Hz), under control of the policy trained with MC-PILCO.
  • Figure 2: Simulation of the Acrobot system (500Hz), under control of the policy trained with MC-PILCO.
  • Figure 3: Pendubot (left) and Acrobot (right) robustness bar charts.