Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots
Yushi Wang, Changsheng Luo, Penghui Chen, Jianran Liu, Weijian Sun, Tong Guo, Kechang Yang, Biao Hu, Yangang Zhang, Mingguo Zhao
TL;DR
This work tackles the challenge of achieving real-time, vision-driven, reactive control for humanoid soccer by unifying perception and motion through a reinforcement learning framework. It extends Adversarial Motion Priors to perceptual settings, incorporating an encoder–decoder latent representation and a virtual perception system to bridge sim-to-real gaps, enabling active perception and robust ball tracking. The approach yields a single, versatile policy that demonstrates agile walking, chasing, and kicking across varied environments, with strong real-world RoboCup performance and zero-shot transfer from simulation. The study offers practical advances for embodied intelligence in unstructured domains and points to future multi-agent extensions to support team-based strategies.
Abstract
Humanoid soccer poses a representative challenge for embodied intelligence, requiring robots to operate within a tightly coupled perception-action loop. However, existing systems typically rely on decoupled modules, resulting in delayed responses and incoherent behaviors in dynamic environments, while real-world perceptual limitations further exacerbate these issues. In this work, we present a unified reinforcement learning-based controller that enables humanoid robots to acquire reactive soccer skills through the direct integration of visual perception and motion control. Our approach extends Adversarial Motion Priors to perceptual settings in real-world dynamic environments, bridging motion imitation and visually grounded dynamic control. We introduce an encoder-decoder architecture combined with a virtual perception system that models real-world visual characteristics, allowing the policy to recover privileged states from imperfect observations and establish active coordination between perception and action. The resulting controller demonstrates strong reactivity, consistently executing coherent and robust soccer behaviors across various scenarios, including real RoboCup matches.
