Table of Contents
Fetching ...

Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer

Adam Labiosa, Zhihan Wang, Siddhant Agarwal, William Cong, Geethika Hemkumar, Abhinav Narayan Harish, Benjamin Hong, Josh Kelle, Chen Li, Yuhao Li, Zisen Shao, Peter Stone, Josiah P. Hanna

TL;DR

This work tackles high-level decision-making in a dynamic, multi-agent robot soccer domain by integrating model-free reinforcement learning into a classical robotics stack. It employs a multi-fidelity sim2real training regime and decomposes high-level behavior into four sub-policies, selected at deployment via a heuristic policy selector. Using PPO, the authors train distinct policies with varying action and observation spaces across sim2real environments, then validate the approach through empirical studies and real-robot competition results, achieving 7/8 wins and a 39-7 score in the 2024 RoboCup SPL Challenge Shield Division. The results demonstrate the viability of RL for complete robot behavior in complex, partially observable, and adversarial tasks, and provide design guidance on multi-fidelity training, behavior decomposition, and heuristic policy integration for real-world robotics.

Abstract

Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decision-making in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.

Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer

TL;DR

This work tackles high-level decision-making in a dynamic, multi-agent robot soccer domain by integrating model-free reinforcement learning into a classical robotics stack. It employs a multi-fidelity sim2real training regime and decomposes high-level behavior into four sub-policies, selected at deployment via a heuristic policy selector. Using PPO, the authors train distinct policies with varying action and observation spaces across sim2real environments, then validate the approach through empirical studies and real-robot competition results, achieving 7/8 wins and a 39-7 score in the 2024 RoboCup SPL Challenge Shield Division. The results demonstrate the viability of RL for complete robot behavior in complex, partially observable, and adversarial tasks, and provide design guidance on multi-fidelity training, behavior decomposition, and heuristic policy integration for real-world robotics.

Abstract

Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decision-making in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.

Paper Structure

This paper contains 23 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Architecture of our training and deployment system for robot soccer. The left side illustrates our training setup, utilizing both high-fidelity (SimRobot) and low-fidelity (AbstractSim) simulators to train policies with different action spaces under various scenarios. The right side depicts our deployment architecture for real-world 5v5 games, built upon the B-Human team's classical robotics framework. It includes a Perception Module processing sensor data, a State-Estimation Module computing robot and ball positioning, and our RL decision module. The RL module, receiving processed observations, uses a heuristic Behavior Selection Policy to choose appropriate sub-behavior policies, which determine actions executed by the low-level controller. Our heuristic approach allows for dynamic play style adjustments and easy integration of new policies, and facilitates continuous improvement at deployment time.
  • Figure 2: High-fidelity simulation SimRobot developed by the B-Human RoboCup Team. Physics are based on the Open Dynamics Engine.
  • Figure 3: Custom low-fidelity simulation AbstractSim, in which robots are modeled as rectangles and joint movements are abstracted.
  • Figure 4: Evaluation of policy decomposition on success rate against a defender robot. Success is a goal, failure is an out of bounds or timeout of a minute. Higher is better. Confidence intervals are 95% bootstrapped.
  • Figure 5: Results from training simulation fidelity experiments. We compare low-fidelity AbstractSim trained policies to high-fidelity SimRobot trained policies. We test against a setup with only a goalie and against a setup with a goalie and defender. Success is a goal. Failure is a timeout of a minute or out of bounds. Confidence intervals are 95% bootstrap confidence intervals.
  • ...and 1 more figures