Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd 'AI Olympics with RealAIGym' Competition
Felix Wiebe, Niccolò Turcato, Alberto Dalla Libera, Jean Seong Bjorn Choe, Bumkyu Choi, Tim Lukas Faust, Habib Maraqten, Erfan Aghadavoodi, Marco Cali, Alberto Sinigaglia, Giulio Giacomuzzo, Diego Romeres, Jong-kook Kim, Gian Antonio Susto, Shubham Vyas, Dennis Mronga, Boris Belousov, Jan Peters, Frank Kirchner, Shivesh Kumar
TL;DR
This paper benchmarks four RL approaches on a real, underactuated double pendulum via the RealAIGym platform across simulation and hardware phases. It contrasts model-based MC-PILCO with several model-free methods (AR-EAPO, EvolSAC, HistorySAC), emphasizing robustness to disturbances and sim-to-real transfer. Key findings show MC-PILCO achieving high hardware robustness and AR-EAPO delivering strong, versatile performance across Acrobot and Pendubot, while history-based encoding (HistorySAC) and evolutionary fine-tuning (EvolSAC) display mixed results. The work demonstrates that RL controllers can attain reliable swing-up and stabilization on real hardware and argues RealAIGym as a valuable tool for standardized, reproducible benchmarking in dynamic robotics.
Abstract
In the field of robotics many different approaches ranging from classical planning over optimal control to reinforcement learning (RL) are developed and borrowed from other fields to achieve reliable control in diverse tasks. In order to get a clear understanding of their individual strengths and weaknesses and their applicability in real world robotic scenarios is it important to benchmark and compare their performances not only in a simulation but also on real hardware. The '2nd AI Olympics with RealAIGym' competition was held at the IROS 2024 conference to contribute to this cause and evaluate different controllers according to their ability to solve a dynamic control problem on an underactuated double pendulum system with chaotic dynamics. This paper describes the four different RL methods submitted by the participating teams, presents their performance in the swing-up task on a real double pendulum, measured against various criteria, and discusses their transferability from simulation to real hardware and their robustness to external disturbances.
