Table of Contents
Fetching ...

Learning global control of underactuated systems with Model-Based Reinforcement Learning

Niccolò Turcato, Marco Calì, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres

TL;DR

The paper addresses learning a global controller for two underactuated 2-DoF pendulums (Pendubot and Acrobot) using an efficient Model-Based RL approach. It leverages MC-PILCO, which builds a probabilistic GP dynamics model, optimizes policies via Monte Carlo rollouts, and employs curriculum-based incremental initialization to achieve a global policy. The approach demonstrates improved data efficiency and robustness in simulation, outperforming standard training and a TVLQR baseline in swing-up tasks. This work has practical impact for deploying data-efficient, globally capable controllers on real-world underactuated systems.

Abstract

This short paper describes our proposed solution for the third edition of the "AI Olympics with RealAIGym" competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball \& plate, and Furuta pendulum systems. MC-PILCO optimizes a system dynamics model using interaction data, enabling policy refinement through simulation rather than direct system data optimization. This approach has proven highly effective in physical systems, offering greater data efficiency than Model-Free (MF) alternatives. Notably, MC-PILCO has previously won the first two editions of this competition, demonstrating its robustness in both simulated and real-world environments. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand: learning a global policy for the pendubot and acrobot systems.

Learning global control of underactuated systems with Model-Based Reinforcement Learning

TL;DR

The paper addresses learning a global controller for two underactuated 2-DoF pendulums (Pendubot and Acrobot) using an efficient Model-Based RL approach. It leverages MC-PILCO, which builds a probabilistic GP dynamics model, optimizes policies via Monte Carlo rollouts, and employs curriculum-based incremental initialization to achieve a global policy. The approach demonstrates improved data efficiency and robustness in simulation, outperforming standard training and a TVLQR baseline in swing-up tasks. This work has practical impact for deploying data-efficient, globally capable controllers on real-world underactuated systems.

Abstract

This short paper describes our proposed solution for the third edition of the "AI Olympics with RealAIGym" competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball \& plate, and Furuta pendulum systems. MC-PILCO optimizes a system dynamics model using interaction data, enabling policy refinement through simulation rather than direct system data optimization. This approach has proven highly effective in physical systems, offering greater data efficiency than Model-Free (MF) alternatives. Notably, MC-PILCO has previously won the first two editions of this competition, demonstrating its robustness in both simulated and real-world environments. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand: learning a global policy for the pendubot and acrobot systems.

Paper Structure

This paper contains 11 sections, 12 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: $\gamma_k$ scheduling following \ref{['eq:surrogate_init']}, with $k_m=5, K=10$.
  • Figure 2: Total rollout costs in the policy optimization steps of the two MC-PILCO trainings, the first using the incremental initial distribution, the second using the nominal initial distribution in all trials.
  • Figure 3: 20 simulated trials of the Pendubot system (500Hz), under MC-PILCO's control policy (50Hz). The initial position for each joint is uniformly sampled from the interval $[-\pi,\pi]$.
  • Figure 4: 20 simulated trials of the Pendubot system (500Hz), under MC-PILCO's control policy (50Hz). The initial position for each joint is uniformly sampled from the interval $[-\pi,\pi]$.