Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments
Aya Taourirte, Md Sohag Mia
TL;DR
The paper tackles real-time decision-making and multi-granularity coordination in robotic soccer using a unified MARL framework. It combines a PPO-based action scheduler, a hierarchical RL architecture with options for high-level trajectory planning, and a mean-field approach to scale coordination among many agents. In Webots 4v4 NAO soccer simulations, PPO outperforms DQN, HRL enhances strategic play, and Mean-Field Actor-Critic delivers the best overall performance with 5.93 goals, 89.1% ball control, and 92.3% passing accuracy, while offering improved training stability. The work demonstrates scalable, cooperative multi-agent decision-making with potential applications to autonomous fleets and complex cooperative robotics tasks.
Abstract
The deployment of multi-agent systems in dynamic, adversarial environments like robotic soccer necessitates real-time decision-making, sophisticated cooperation, and scalable algorithms to avoid the curse of dimensionality. While Reinforcement Learning (RL) offers a promising framework, existing methods often struggle with the multi-granularity of tasks (long-term strategy vs. instant actions) and the complexity of large-scale agent interactions. This paper presents a unified Multi-Agent Reinforcement Learning (MARL) framework that addresses these challenges. First, we establish a baseline using Proximal Policy Optimization (PPO) within a client-server architecture for real-time action scheduling, with PPO demonstrating superior performance (4.32 avg. goals, 82.9% ball control). Second, we introduce a Hierarchical RL (HRL) structure based on the options framework to decompose the problem into a high-level trajectory planning layer (modeled as a Semi-Markov Decision Process) and a low-level action execution layer, improving global strategy (avg. goals increased to 5.26). Finally, to ensure scalability, we integrate mean-field theory into the HRL framework, simplifying many-agent interactions into a single agent vs. the population average. Our mean-field actor-critic method achieves a significant performance boost (5.93 avg. goals, 89.1% ball control, 92.3% passing accuracy) and enhanced training stability. Extensive simulations of 4v4 matches in the Webots environment validate our approach, demonstrating its potential for robust, scalable, and cooperative behavior in complex multi-agent domains.
