Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
TL;DR
The paper tackles multi-task multi-agent reinforcement learning under partial observability, where task identities are not observable during execution. It introduces a two-phase approach: Phase I learns single-task decentralized MARL with Dec-HDRQNs and CERTs to stabilize training under non-stationarity, while Phase II distills these specialized policies into a unified multi-task network that performs across related tasks without task IDs. Key contributions include Dec-HDRQNs, Concurrent Experience Replay Trajectories, and a distillation framework that yields a task-agnostic policy with strong coordination in sparse-reward Dec-POMDPs. The work demonstrates decentralized coordination and robust generalization, offering a practical methodology for real-world multi-agent systems with partial observability and limited communication.
Abstract
Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrently-exploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to learn and store distinct policies for each task, but in practice identities of tasks are often non-observable, making these approaches inapplicable. This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability. We introduce a decentralized single-task learning approach that is robust to concurrent interactions of teammates, and present an approach for distilling single-task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity.
