Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning
Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum
TL;DR
Robot Air Hockey addresses the challenge of evaluating reinforcement learning in fast, dynamic, interactive manipulation by providing a multi-domain testbed that spans two simulators and a real UR5e system, all with a unified interface and a rich offline demonstration dataset. The authors compare Behavior Cloning, online PPO, and offline IQL across Box2D, Robosuite, and real-world tasks ranging from reaching to juggling, finding that online RL excels in simulation while offline data supports learning in the real world where exploration is costly. A key contribution is the ten-task suite, the teleoperation dataset (350 mouse-teleop and 50 shadow-teleop trajectories from eight participants), and two simulators of increasing fidelity to enable sim-to-real transfer, offline RL evaluation, and goal-conditioned RL. The testbed enables diverse RL paradigms, potential skill learning, and multi-domain transfer, providing a practical platform for advancing dynamic manipulation with RL and informing future research directions in real-world robotics.
Abstract
Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.
