Table of Contents
Fetching ...

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

TL;DR

Robot Air Hockey addresses the challenge of evaluating reinforcement learning in fast, dynamic, interactive manipulation by providing a multi-domain testbed that spans two simulators and a real UR5e system, all with a unified interface and a rich offline demonstration dataset. The authors compare Behavior Cloning, online PPO, and offline IQL across Box2D, Robosuite, and real-world tasks ranging from reaching to juggling, finding that online RL excels in simulation while offline data supports learning in the real world where exploration is costly. A key contribution is the ten-task suite, the teleoperation dataset (350 mouse-teleop and 50 shadow-teleop trajectories from eight participants), and two simulators of increasing fidelity to enable sim-to-real transfer, offline RL evaluation, and goal-conditioned RL. The testbed enables diverse RL paradigms, potential skill learning, and multi-domain transfer, providing a practical platform for advancing dynamic manipulation with RL and informing future research directions in real-world robotics.

Abstract

Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

TL;DR

Robot Air Hockey addresses the challenge of evaluating reinforcement learning in fast, dynamic, interactive manipulation by providing a multi-domain testbed that spans two simulators and a real UR5e system, all with a unified interface and a rich offline demonstration dataset. The authors compare Behavior Cloning, online PPO, and offline IQL across Box2D, Robosuite, and real-world tasks ranging from reaching to juggling, finding that online RL excels in simulation while offline data supports learning in the real world where exploration is costly. A key contribution is the ten-task suite, the teleoperation dataset (350 mouse-teleop and 50 shadow-teleop trajectories from eight participants), and two simulators of increasing fidelity to enable sim-to-real transfer, offline RL evaluation, and goal-conditioned RL. The testbed enables diverse RL paradigms, potential skill learning, and multi-domain transfer, providing a practical platform for advancing dynamic manipulation with RL and informing future research directions in real-world robotics.

Abstract

Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.
Paper Structure (28 sections, 1 equation, 10 figures, 1 table)

This paper contains 28 sections, 1 equation, 10 figures, 1 table.

Figures (10)

  • Figure 1: (a) Robot Air Hockey is a testbed that contains a large number of dynamic, interactive Air Hockey tasks in multiple distinct simulators as well as the real world. It is suitable for evaluating a variety of frameworks from vanilla RL (both online and offline), learning from demonstrations, to skill transfer or goal-conditioned RL. (b) Overview of our control pipeline. We use object detection to get the state of interactive objects, and the UR5 RTDE controller to transform the task actions into joint forces for the robot.
  • Figure 2: Robot Air Hockey supports two types of teleoperations. Mouse-Teleop (top): The user moves the mouse to control the robot. Shadow-Teleop (bottom): the user moves a paddle to control the robot.
  • Figure 3: Robot Air Hockey real-world setup. We use a top-down camera to provide observation and a UR5e robot to actuate the paddle. Our real-world setup can facilitate many air hockey tasks, including but not limited to reaching, touching, and hitting.
  • Figure 4: Execution rollouts in Box2D for various tasks. For each first frame the motion of the puck is downwards. Top row: The task where the policy tries to hit the puck to reach a minimum amount of upward velocity. Middle row: The task where the policy hits the puck into a crowd of blocks, causes them to spread. Bottom row: The task where the policy moves a puck into a goal region, shown as a green circle.
  • Figure 5: Execution rollouts on the UR5 air hockey setup for policies trained with behavior cloning, IQL with the touching loss, and human teleoperation demonstrations. Notice that even though humans are trying to achieve multiple bounces, they often hit the puck too erratically to effectively return, so demonstrations can vary significantly in skill, even after cleaning human failure modes.
  • ...and 5 more figures