Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning
Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess
TL;DR
The paper addresses end-to-end, vision-based multi-agent robot soccer with onboard sensing and partial observability, proposing a two-stage RL framework and zero-shot sim-to-real transfer using NeRF-based rendering. It combines memory-enabled policy learning, Replay across Experiments for data efficiency, and adaptive KL-regularization to distill expert skills into a single robust agent. Empirical results show emergent active perception behaviors, robust ball tracking, and agility on par with state-based policies in simulation, with real-world transfer demonstrated on humanoid robots, albeit with some performance drop due to real-world noise. The work highlights NeRF-based realistic rendering and diverse visual domain randomization as essential for sim-to-real success, and analyzes data-source strategies to favor vision-based learning for complex, long-horizon tasks.
Abstract
We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .
