Learning Symbolic Task Decompositions for Multi-Agent Teams
Ameesh Shah, Niklas Lauffer, Thomas Chen, Nikhil Pitta, Sanjit A. Seshia
TL;DR
The paper tackles the credit assignment problem in cooperative multi-agent reinforcement learning by automatically learning how to decompose a complex task into sub-tasks using reward machines. It introduces LOTaD, which simultaneously searches over a set of candidate task decompositions and trains task-conditioned policies for each sub-task, guided by an upper confidence bound strategy to balance exploration and exploitation. The approach relaxes the assumption of independent agent dynamics by providing a global view of the overall task and incentivizing coordination, enabling effective learning even in environments with codependent dynamics. Experimental results across Repairs, Buttons, and Overcooked domains show that LOTaD outperforms baselines, improves sample efficiency, and demonstrates the practicality of automated symbolic task decomposition for multi-agent teams.
Abstract
One approach for improving sample efficiency in cooperative multi-agent learning is to decompose overall tasks into sub-tasks that can be assigned to individual agents. We study this problem in the context of reward machines: symbolic tasks that can be formally decomposed into sub-tasks. In order to handle settings without a priori knowledge of the environment, we introduce a framework that can learn the optimal decomposition from model-free interactions with the environment. Our method uses a task-conditioned architecture to simultaneously learn an optimal decomposition and the corresponding agents' policies for each sub-task. In doing so, we remove the need for a human to manually design the optimal decomposition while maintaining the sample-efficiency benefits of improved credit assignment. We provide experimental results in several deep reinforcement learning settings, demonstrating the efficacy of our approach. Our results indicate that our approach succeeds even in environments with codependent agent dynamics, enabling synchronous multi-agent learning not achievable in previous works.
