Table of Contents
Fetching ...

Distilling Contact Planning for Fast Trajectory Optimization in Robot Air Hockey

Julius Jankowski, Ante Marić, Puze Liu, Davide Tateo, Jan Peters, Sylvain Calinon

TL;DR

This work tackles the challenge of fast, contact-rich planning in dynamic robot tasks by coupling a learned stochastic contact model with a distilled, implicit policy and a model-predictive controller. The puck dynamics are modeled as a mixture of linear-Gaussian modes (Floating, Puck-Wall, Puck-Mallet), learned from data and updated via a piecewise-linear Kalman filter to enable online state estimation and goal probability assessment. Shooting planning is cast as a chance-constrained stochastic optimal control problem, with a reduced action space via a shooting angle and an energy-based model that enables real-time, multimodal decision making through offline data and online sampling with warm-starting. The approach demonstrates superior performance over control-based and learning-based baselines in both simulation and real-world air hockey, with robust sim-to-real transfer and controllable behavior through objective weights and constraints. Still, limitations include the need for low-dimensional task spaces and reliance on priors, suggesting future work on higher-dimensional spaces and integrating additional priors for data-efficient learning.

Abstract

Robot control through contact is challenging as it requires reasoning over long horizons and discontinuous system dynamics. Highly dynamic tasks such as Air Hockey additionally require agile behavior, making the corresponding optimal control problems intractable for planning in realtime. Learning-based approaches address this issue by shifting computationally expensive reasoning through contacts to an offline learning phase. However, learning low-level motor policies subject to kinematic and dynamic constraints can be challenging if operating in proximity to such constraints is desired. This paper explores the combination of distilling a stochastic optimal control policy for high-level contact planning and online model-predictive control for low-level constrained motion planning. Our system learns to balance shooting accuracy and resulting puck speed by leveraging bank shots and the robot's kinematic structure. We show that the proposed framework outperforms purely control-based and purely learning-based techniques in both simulated and real-world games of Robot Air Hockey.

Distilling Contact Planning for Fast Trajectory Optimization in Robot Air Hockey

TL;DR

This work tackles the challenge of fast, contact-rich planning in dynamic robot tasks by coupling a learned stochastic contact model with a distilled, implicit policy and a model-predictive controller. The puck dynamics are modeled as a mixture of linear-Gaussian modes (Floating, Puck-Wall, Puck-Mallet), learned from data and updated via a piecewise-linear Kalman filter to enable online state estimation and goal probability assessment. Shooting planning is cast as a chance-constrained stochastic optimal control problem, with a reduced action space via a shooting angle and an energy-based model that enables real-time, multimodal decision making through offline data and online sampling with warm-starting. The approach demonstrates superior performance over control-based and learning-based baselines in both simulation and real-world air hockey, with robust sim-to-real transfer and controllable behavior through objective weights and constraints. Still, limitations include the need for low-dimensional task spaces and reliance on priors, suggesting future work on higher-dimensional spaces and integrating additional priors for data-efficient learning.

Abstract

Robot control through contact is challenging as it requires reasoning over long horizons and discontinuous system dynamics. Highly dynamic tasks such as Air Hockey additionally require agile behavior, making the corresponding optimal control problems intractable for planning in realtime. Learning-based approaches address this issue by shifting computationally expensive reasoning through contacts to an offline learning phase. However, learning low-level motor policies subject to kinematic and dynamic constraints can be challenging if operating in proximity to such constraints is desired. This paper explores the combination of distilling a stochastic optimal control policy for high-level contact planning and online model-predictive control for low-level constrained motion planning. Our system learns to balance shooting accuracy and resulting puck speed by leveraging bank shots and the robot's kinematic structure. We show that the proposed framework outperforms purely control-based and purely learning-based techniques in both simulated and real-world games of Robot Air Hockey.
Paper Structure (24 sections, 16 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 16 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: The proposed control framework enables our robot to autonomously play matches of air hockey. The dynamic game requires the robot to predict puck trajectories, plan the best contact, and coordinate its joints to generate high velocities without hitting a wall or lifting the mallet from the table.
  • Figure 2: Overview of the interplay between puck state estimation $\bullet$ and robot control $\bullet$ for closed-loop agile robot air hockey. The contact planner uses the estimated puck state to predict the puck trajectory based on the learned model. It subsequently plans a shooting angle that is used to construct an optimal control objective solved within a model-predictive controller. Robot trajectories are computed at a control rate of 50 Hz.
  • Figure 3: Illustration of three modes of the puck dynamics that are parameterized as linear-Gaussian modes. Mode 1) captures the dynamics of the puck $\bullet$ when floating on the surface of the table. Mode 2) captures collisions between puck and walls. Mode 3) models collisions between puck and the mallet $\bullet$ in a contact-aligned frame $\mathcal{C}$. The parameters for the nominal dynamics and the corresponding uncertainty are learned from data.
  • Figure 4: A qualitative comparison of the probability of hitting the goal $\hat{\mathrm{G}}$ for different shooting angles and shooting speeds. The shooting angles are indicated by the mallet position $\bullet$ w.r.t. the puck position $\bullet$ at the time of contact. The shooting speed, i.e. the speed of the mallet at the time of contact, is 1.2 $\frac{\mathrm{m}}{\mathrm{s}}$ for a) and c), while the shooting speed is 2 $\frac{\mathrm{m}}{\mathrm{s}}$ for b).
  • Figure 5: Examples of differently tuned shooting plans and corresponding energy landscapes. Shooting direction and mallet speed are displayed for varying initial puck positions. Instance a) evaluates only scoring probability ($\lambda_1=1,~\lambda_2=0,~\beta=0.5$); b) adds additional weight on expected puck speed at the goal line ($\lambda_1=1,~\lambda_2=0.2,~\beta=0.5$); c) evaluates only the expected puck speed at the goal line ($\lambda_1=0,~\lambda_2=1,~\beta=0.5$). The energy landscape and sampling process at timesteps $j\in\{1,12,25\}$ are visualized for an example shot denoted in red. All pucks are static at $j=0$.
  • ...and 2 more figures