Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning
Zixuan Wu, Zulfiqar Zaidi, Adithya Patil, Qingyu Xiao, Matthew Gombolay
TL;DR
This work tackles the challenge of transferring expert navigation strategies from monocular broadcast videos to agile robotic systems performing wheelchair tennis on real courts. It introduces a zero-shot framework that uses diffusion-based imitation learning in the 2D image space, while reconstructing 3D task-space trajectories from multiple views and then mapping plans back to 3D for real-time control. Key contributions include a broadcast-video data extraction pipeline, a knowledge-transfer framework bridging 2D expert space and 3D task space, and a diffusion-imitation policy paired with a real-time PD controller, yielding strong results in hardware-in-loop and full-court experiments. The approach demonstrates high robustness and generality for 2D constrained navigation tasks, with practical implications for rapid deployment of learned athletic robotics from readily available web videos.
Abstract
In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.
