Table of Contents
Fetching ...

Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Zixuan Wu, Zulfiqar Zaidi, Adithya Patil, Qingyu Xiao, Matthew Gombolay

TL;DR

This work tackles the challenge of transferring expert navigation strategies from monocular broadcast videos to agile robotic systems performing wheelchair tennis on real courts. It introduces a zero-shot framework that uses diffusion-based imitation learning in the 2D image space, while reconstructing 3D task-space trajectories from multiple views and then mapping plans back to 3D for real-time control. Key contributions include a broadcast-video data extraction pipeline, a knowledge-transfer framework bridging 2D expert space and 3D task space, and a diffusion-imitation policy paired with a real-time PD controller, yielding strong results in hardware-in-loop and full-court experiments. The approach demonstrates high robustness and generality for 2D constrained navigation tasks, with practical implications for rapid deployment of learned athletic robotics from readily available web videos.

Abstract

In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.

Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

TL;DR

This work tackles the challenge of transferring expert navigation strategies from monocular broadcast videos to agile robotic systems performing wheelchair tennis on real courts. It introduces a zero-shot framework that uses diffusion-based imitation learning in the 2D image space, while reconstructing 3D task-space trajectories from multiple views and then mapping plans back to 3D for real-time control. Key contributions include a broadcast-video data extraction pipeline, a knowledge-transfer framework bridging 2D expert space and 3D task space, and a diffusion-imitation policy paired with a real-time PD controller, yielding strong results in hardware-in-loop and full-court experiments. The approach demonstrates high robustness and generality for 2D constrained navigation tasks, with practical implications for rapid deployment of learned athletic robotics from readily available web videos.

Abstract

In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.
Paper Structure (16 sections, 1 equation, 10 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 1 equation, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: It is not possible to transfer knowledge directly from the expert videos to our task spaces due to the lack of ball depth in the expert image space and different camera angles to the task space (red arrows). As such, we propose a three-step approach (green arrows): 1) reconstruct full 3D motion in the task space (ball-wheelchair), 2) project all motions into the 2D to apply the imitation policy, and 3) transfer planar wheelchair motion back to the 3D task space.
  • Figure 2: Overview of our knowledge transferring framework: The wheelchair motion is the motion of interest (MOI, indicated by red) since we need to control it in our task space. The ball motion is the environment motion (Env-M, indicated by green) on which the wheelchair should condition. The knowledge transfer is valid with the only assumption that MOI is 2D motion and Env-M can be 3D.
  • Figure 3: Various steps involved in creating the dataset from broadcast footage of Paralympic 2020 games.
  • Figure 4: The pipeline filters out rest and replay frames (top) while extracting relevant gameplay data (bottom).
  • Figure 5: Diffusion Trajectory
  • ...and 5 more figures