Table of Contents
Fetching ...

Learning to Play Air Hockey with Model-Based Deep Reinforcement Learning

Andrej Orsula

TL;DR

This work applies model-based deep reinforcement learning (DreamerV3) to autonomously play air hockey with a robotic manipulator, using sparse rewards and self-play to improve generalization against unseen opponents. The approach combines a world model, actor, and critic with low-dimensional, stacked observations and high-level Cartesian actions mapped via an inverse Jacobian, plus three playstyles and a multi-strategy ensemble. Key findings show that self-play is essential to avoid overfitting to a baseline opponent, and that longer imagination horizons stabilize learning and boost performance, though real-world latency remains a challenge in sim-to-real transfer. Overall, the study demonstrates the viability of model-based RL for contact-rich robot manipulation and highlights directions for safer, more adaptable policies in competitive settings.

Abstract

In the context of addressing the Robot Air Hockey Challenge 2023, we investigate the applicability of model-based deep reinforcement learning to acquire a policy capable of autonomously playing air hockey. Our agents learn solely from sparse rewards while incorporating self-play to iteratively refine their behaviour over time. The robotic manipulator is interfaced using continuous high-level actions for position-based control in the Cartesian plane while having partial observability of the environment with stochastic transitions. We demonstrate that agents are prone to overfitting when trained solely against a single playstyle, highlighting the importance of self-play for generalization to novel strategies of unseen opponents. Furthermore, the impact of the imagination horizon is explored in the competitive setting of the highly dynamic game of air hockey, with longer horizons resulting in more stable learning and better overall performance.

Learning to Play Air Hockey with Model-Based Deep Reinforcement Learning

TL;DR

This work applies model-based deep reinforcement learning (DreamerV3) to autonomously play air hockey with a robotic manipulator, using sparse rewards and self-play to improve generalization against unseen opponents. The approach combines a world model, actor, and critic with low-dimensional, stacked observations and high-level Cartesian actions mapped via an inverse Jacobian, plus three playstyles and a multi-strategy ensemble. Key findings show that self-play is essential to avoid overfitting to a baseline opponent, and that longer imagination horizons stabilize learning and boost performance, though real-world latency remains a challenge in sim-to-real transfer. Overall, the study demonstrates the viability of model-based RL for contact-rich robot manipulation and highlights directions for safer, more adaptable policies in competitive settings.

Abstract

In the context of addressing the Robot Air Hockey Challenge 2023, we investigate the applicability of model-based deep reinforcement learning to acquire a policy capable of autonomously playing air hockey. Our agents learn solely from sparse rewards while incorporating self-play to iteratively refine their behaviour over time. The robotic manipulator is interfaced using continuous high-level actions for position-based control in the Cartesian plane while having partial observability of the environment with stochastic transitions. We demonstrate that agents are prone to overfitting when trained solely against a single playstyle, highlighting the importance of self-play for generalization to novel strategies of unseen opponents. Furthermore, the impact of the imagination horizon is explored in the competitive setting of the highly dynamic game of air hockey, with longer horizons resulting in more stable learning and better overall performance.
Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The simulation environment used in the Robot Air Hockey Challenge 2023 that features an air hockey table, puck, and two robotic manipulators controlled by the participating teams.
  • Figure 2: The learning curve of an agent following the balanced strategy.
  • Figure 3: Learning curves of agents trained with and without self-play.
  • Figure 4: Learning curves of agents trained with different imagination horizon lengths (H).