Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation
Adam Labiosa, Josiah P. Hanna
TL;DR
This work investigates whether cooperative policies learned in a low-fidelity abstract simulator can transfer to physical robot teams. Policies are trained in an abstract simulator (AbstractSim) and then deployed on NAO humanoid robots, with three modification categories identified to close the reality gap: Simulation Fidelity Enhancements, Training Optimizations, and Simulation Stochasticity. An extensive ablation study on robot soccer tasks demonstrates that abstractions can yield transfer to real robots, sometimes even outperforming or matching hand-tuned RoboCup baselines, while highlighting the nuanced trade-offs between realism and trainability. Ball-robot contact noise and carefully chosen training optimizations emerge as the most impactful factors for successful sim2real transfer. Collectively, the results support the viability of using abstract simulators to develop transferable multi-robot policies, potentially reducing the need for costly real-world training in multi-agent robotics.
Abstract
Teams of people coordinate to perform complex tasks by forming abstract mental models of world and agent dynamics. The use of abstract models contrasts with much recent work in robot learning that uses a high-fidelity simulator and reinforcement learning (RL) to obtain policies for physical robots. Motivated by this difference, we investigate the extent to which so-called abstract simulators can be used for multi-agent reinforcement learning (MARL) and the resulting policies successfully deployed on teams of physical robots. An abstract simulator models the robot's target task at a high-level of abstraction and discards many details of the world that could impact optimal decision-making. Policies are trained in an abstract simulator then transferred to the physical robot by making use of separately-obtained low-level perception and motion control modules. We identify three key categories of modifications to the abstract simulator that enable policy transfer to physical robots: simulation fidelity enhancements, training optimizations and simulation stochasticity. We then run an empirical study with extensive ablations to determine the value of each modification category for enabling policy transfer in cooperative robot soccer tasks. We also compare the performance of policies produced by our method with a well-tuned non-learning-based behavior architecture from the annual RoboCup competition and find that our approach leads to a similar level of performance. Broadly we show that MARL can be use to train cooperative physical robot behaviors using highly abstract models of the world.
