Learning control of underactuated double pendulum with Model-Based Reinforcement Learning
Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
TL;DR
This work applies a model-based reinforcement learning framework, MC-PILCO, to control two underactuated 2-DOF pendulum systems (Pendubot and Acrobot) for swing-up and stabilization. By learning a GP-based dynamics model and optimizing policies via simulated rollout with a Monte Carlo approach, the method achieves data-efficient control at $50$ Hz, including a prior-mean dynamics term derived from forward dynamics. The study demonstrates swing-up performance comparable to prior results while noting limitations in energy- and torque-smoothness penalties and sensitivity to parameter changes, yet emphasizing the practical value of rapid retraining on hardware and robustness to actuation perturbations. The findings support MC-PILCO as a viable MB-RL strategy for underactuated robotics, particularly when data is scarce and retraining is feasible.
Abstract
This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.
