Table of Contents
Fetching ...

AiRLIHockey: Highly Reactive Contact Control and Stochastic Optimal Shooting

Julius Jankowski, Ante Marić, Sylvain Calinon

TL;DR

AiRLIHockey addresses robust, reactive control in air hockey under stochastic contact dynamics by introducing a hierarchical framework that separately plans a shooting interaction and then executes constrained mallet trajectories. The approach combines an offline learned stochastic shooting-angle policy via an energy-based model with online sampling-based model-predictive control at 50 Hz to produce mallet motions that respect table constraints. The key contributions include piecewise-local-linear puck dynamics with an EKF observer, a two-phase shooting planner that optimizes the final puck state, and a trajectory-level MPC that leverages offline basis functions for fast online computation. Results from the NeurIPS 2023 Robot Air-Hockey challenge show state-of-the-art performance in simulation and indicate strong potential for transfer to real hardware.

Abstract

Air hockey is a highly reactive game which requires the player to quickly reason over stochastic puck and contact dynamics. We implement a hierarchical framework which combines stochastic optimal control for planning shooting angles and sampling-based model-predictive control for continuously generating constrained mallet trajectories. Our agent was deployed and evaluated in simulation and on a physical setup as part of the Robot Air-Hockey challenge competition at NeurIPS 2023.

AiRLIHockey: Highly Reactive Contact Control and Stochastic Optimal Shooting

TL;DR

AiRLIHockey addresses robust, reactive control in air hockey under stochastic contact dynamics by introducing a hierarchical framework that separately plans a shooting interaction and then executes constrained mallet trajectories. The approach combines an offline learned stochastic shooting-angle policy via an energy-based model with online sampling-based model-predictive control at 50 Hz to produce mallet motions that respect table constraints. The key contributions include piecewise-local-linear puck dynamics with an EKF observer, a two-phase shooting planner that optimizes the final puck state, and a trajectory-level MPC that leverages offline basis functions for fast online computation. Results from the NeurIPS 2023 Robot Air-Hockey challenge show state-of-the-art performance in simulation and indicate strong potential for transfer to real hardware.

Abstract

Air hockey is a highly reactive game which requires the player to quickly reason over stochastic puck and contact dynamics. We implement a hierarchical framework which combines stochastic optimal control for planning shooting angles and sampling-based model-predictive control for continuously generating constrained mallet trajectories. Our agent was deployed and evaluated in simulation and on a physical setup as part of the Robot Air-Hockey challenge competition at NeurIPS 2023.
Paper Structure (8 sections, 3 equations, 3 figures)

This paper contains 8 sections, 3 equations, 3 figures.

Figures (3)

  • Figure 1: Diagram overview of the AiRLIHockey agent. Starting from noisy observations, we estimate the puck state subject to estimated model parameters. Depending on the state, a mode (e.g. shooting or defending) is triggered by a heuristic state machine. The contact state planner and the subsequent mallet controller generate control actions for the mallet. For joint-level control, we use constrained quadratic programming to find the next joint state that tracks the mallet trajectory planned by the higher layers, while staying close to a reference configuration (e.g. a high-manipulability configuration for shooting), with constraints on the joint position, joint velocity, and the z-coordinate of the mallet in order to stay in contact with the table at all times.
  • Figure 2: Overview of the interplay between the puck and the mallet for the subtask of scoring a goal. Given the mallet position and the estimated puck position, our framework generates a motion plan for the mallet such that the score probability is maximized.
  • Figure 3: Sampling the action space for an example scenario shown in Figure \ref{['fig:example']}. Dark blue points show samples at the initial timestep. Iterative recentering and variance reductions lead to the final samples shown in cyan. The red vertical line denotes the selected action.