Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
TL;DR
The paper tackles the challenge of efficiently learning complex cooperative tasks in multi-agent reinforcement learning by using expert-provided sub-task decompositions to form curricula, training sub-teams on simpler tasks before fine-tuning on the target task. It identifies two problems with naive fine-tuning—miscoordinated exploration and forgetting of sub-task skills—and proposes MEDoE, a modular extension that uses a Domain of Expertise classifier to adapt exploration and regularization dynamically during fine-tuning. Empirical results across Chainball, Overcooked, and VMAS Football show that STD can reduce total training timesteps in some cases, and that MEDoE substantially improves performance in several domains by mitigating the identified issues. The approach is compatible with decentralised actor-critic methods and offers a scalable path to leveraging expert task decompositions in diverse multi-agent settings, with future directions including automatic curriculum discovery and online DoE updates for partial observability.
Abstract
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
