Table of Contents
Fetching ...

Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition

Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht

TL;DR

The paper tackles the challenge of efficiently learning complex cooperative tasks in multi-agent reinforcement learning by using expert-provided sub-task decompositions to form curricula, training sub-teams on simpler tasks before fine-tuning on the target task. It identifies two problems with naive fine-tuning—miscoordinated exploration and forgetting of sub-task skills—and proposes MEDoE, a modular extension that uses a Domain of Expertise classifier to adapt exploration and regularization dynamically during fine-tuning. Empirical results across Chainball, Overcooked, and VMAS Football show that STD can reduce total training timesteps in some cases, and that MEDoE substantially improves performance in several domains by mitigating the identified issues. The approach is compatible with decentralised actor-critic methods and offers a scalable path to leveraging expert task decompositions in diverse multi-agent settings, with future directions including automatic curriculum discovery and online DoE updates for partial observability.

Abstract

Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.

Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition

TL;DR

The paper tackles the challenge of efficiently learning complex cooperative tasks in multi-agent reinforcement learning by using expert-provided sub-task decompositions to form curricula, training sub-teams on simpler tasks before fine-tuning on the target task. It identifies two problems with naive fine-tuning—miscoordinated exploration and forgetting of sub-task skills—and proposes MEDoE, a modular extension that uses a Domain of Expertise classifier to adapt exploration and regularization dynamically during fine-tuning. Empirical results across Chainball, Overcooked, and VMAS Football show that STD can reduce total training timesteps in some cases, and that MEDoE substantially improves performance in several domains by mitigating the identified issues. The approach is compatible with decentralised actor-critic methods and offers a scalable path to leveraging expert task decompositions in diverse multi-agent settings, with future directions including automatic curriculum discovery and online DoE updates for partial observability.

Abstract

Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
Paper Structure (35 sections, 10 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 35 sections, 10 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: STD for 5-a-side football. We control the red team, and play against the grey team. We train defender agents in defensive drills (top-left) and attacker agents in attack drills (bottom-left). As represented by the arrows, we transfer these agents to the 5-a-side target task. We then fine-tune our combined team in the target task.
  • Figure 2: Environments used in our experiments.
  • Figure 3: Target task training returns for each environment. Mean episodic returns (100 episodes), averaged over 16 runs (16 different team combinations). The "from-scratch" baselines are averaged over 16 runs (16 seeds; 8 seeds for QMIX on VMAS Football). Shaded area shows the 95% confidence interval of the mean over the runs. Naive STD and MEDoE are shifted on the training step axis to account for the total training steps required across all the sub-tasks, shown by the dashed line.
  • Figure 4: Forgetting curve in Overcooked.
  • Figure 5: Ablations of MEDoE. Returns are averaged over 16 runs (4 runs in VMAS Football). Error bars are omitted to improve clarity. In the legend, labels show a letter if that coefficient is modulated, or a dash otherwise. E.g., "$T\hbox{--}\alpha$" modulates temperature coefficient ($T$) and entropy coefficient ($\alpha$) but not KL coefficient ($\kappa$).
  • ...and 3 more figures