Table of Contents
Fetching ...

Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation

Adam Labiosa, Josiah P. Hanna

TL;DR

This work investigates whether cooperative policies learned in a low-fidelity abstract simulator can transfer to physical robot teams. Policies are trained in an abstract simulator (AbstractSim) and then deployed on NAO humanoid robots, with three modification categories identified to close the reality gap: Simulation Fidelity Enhancements, Training Optimizations, and Simulation Stochasticity. An extensive ablation study on robot soccer tasks demonstrates that abstractions can yield transfer to real robots, sometimes even outperforming or matching hand-tuned RoboCup baselines, while highlighting the nuanced trade-offs between realism and trainability. Ball-robot contact noise and carefully chosen training optimizations emerge as the most impactful factors for successful sim2real transfer. Collectively, the results support the viability of using abstract simulators to develop transferable multi-robot policies, potentially reducing the need for costly real-world training in multi-agent robotics.

Abstract

Teams of people coordinate to perform complex tasks by forming abstract mental models of world and agent dynamics. The use of abstract models contrasts with much recent work in robot learning that uses a high-fidelity simulator and reinforcement learning (RL) to obtain policies for physical robots. Motivated by this difference, we investigate the extent to which so-called abstract simulators can be used for multi-agent reinforcement learning (MARL) and the resulting policies successfully deployed on teams of physical robots. An abstract simulator models the robot's target task at a high-level of abstraction and discards many details of the world that could impact optimal decision-making. Policies are trained in an abstract simulator then transferred to the physical robot by making use of separately-obtained low-level perception and motion control modules. We identify three key categories of modifications to the abstract simulator that enable policy transfer to physical robots: simulation fidelity enhancements, training optimizations and simulation stochasticity. We then run an empirical study with extensive ablations to determine the value of each modification category for enabling policy transfer in cooperative robot soccer tasks. We also compare the performance of policies produced by our method with a well-tuned non-learning-based behavior architecture from the annual RoboCup competition and find that our approach leads to a similar level of performance. Broadly we show that MARL can be use to train cooperative physical robot behaviors using highly abstract models of the world.

Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation

TL;DR

This work investigates whether cooperative policies learned in a low-fidelity abstract simulator can transfer to physical robot teams. Policies are trained in an abstract simulator (AbstractSim) and then deployed on NAO humanoid robots, with three modification categories identified to close the reality gap: Simulation Fidelity Enhancements, Training Optimizations, and Simulation Stochasticity. An extensive ablation study on robot soccer tasks demonstrates that abstractions can yield transfer to real robots, sometimes even outperforming or matching hand-tuned RoboCup baselines, while highlighting the nuanced trade-offs between realism and trainability. Ball-robot contact noise and carefully chosen training optimizations emerge as the most impactful factors for successful sim2real transfer. Collectively, the results support the viability of using abstract simulators to develop transferable multi-robot policies, potentially reducing the need for costly real-world training in multi-agent robotics.

Abstract

Teams of people coordinate to perform complex tasks by forming abstract mental models of world and agent dynamics. The use of abstract models contrasts with much recent work in robot learning that uses a high-fidelity simulator and reinforcement learning (RL) to obtain policies for physical robots. Motivated by this difference, we investigate the extent to which so-called abstract simulators can be used for multi-agent reinforcement learning (MARL) and the resulting policies successfully deployed on teams of physical robots. An abstract simulator models the robot's target task at a high-level of abstraction and discards many details of the world that could impact optimal decision-making. Policies are trained in an abstract simulator then transferred to the physical robot by making use of separately-obtained low-level perception and motion control modules. We identify three key categories of modifications to the abstract simulator that enable policy transfer to physical robots: simulation fidelity enhancements, training optimizations and simulation stochasticity. We then run an empirical study with extensive ablations to determine the value of each modification category for enabling policy transfer in cooperative robot soccer tasks. We also compare the performance of policies produced by our method with a well-tuned non-learning-based behavior architecture from the annual RoboCup competition and find that our approach leads to a similar level of performance. Broadly we show that MARL can be use to train cooperative physical robot behaviors using highly abstract models of the world.

Paper Structure

This paper contains 20 sections, 6 figures.

Figures (6)

  • Figure 1: Comparison of the physical NAOv6 used in experiments to the abstract simulation we use to train the policies to control high-level cooperative behaviors.
  • Figure 2: Visualization of our training and deployment interaction models. Exact world state is given during training and world state is estimated by the robot architecture modules during deployment. Similarly, in simulation the policy exactly controls high-level actions while during deployment they are passed to a motion controller which controls motor outputs.
  • Figure 3: High-fidelity simulation developed by the B-Human RoboCup Team. Used for simulation experiments. Physics are based on the Open Dynamics Engine smith2005open.
  • Figure 4: Physical robot experiment results. F are full methods. E are fidelity enhancements. T are training optimizations. N are noise. Our full method is Full MARL. Each column provides an experiment location for either the Basic Soccer (BS) task or the Defender (D) task. Number of trials is 10. 95% confidence intervals computed using the Student t-distribution.
  • Figure 5: Simulation experiment results. F are full methods. E are fidelity enhancements. T are training optimizations. N are noise. Our full method is Full MARL. Each column provides an experiment location for either the Basic Soccer (BS) task or the Defender (D) task. Number of trials is 100. 95% confidence intervals computed using the Student t-distribution.
  • ...and 1 more figures