Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation
Huilin Yin, Zhikun Yang, Linchuan Zhang, Daniel Watzenig
TL;DR
This paper tackles multi-agent task allocation under dynamic environments where handcrafted reward functions hinder adaptability. It presents Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions, combining MHSA-based trajectory encoding and a global information graph attention module to infer reward densities from expert demonstrations within a GAIL-style framework. The learned reward, expressed as $r_i^{IRL} = \alpha r_i + \beta$, guides policy learning, while global-state features improve coordination across agents and tasks. Empirical results against MASAC and MAPPO show higher cumulative rewards, faster convergence, and better scalability across varying agent and task counts, underscoring the approach's potential for robust, scalable MATA in logistics and robotic coordination.
Abstract
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Multi-agent task allocation (MATA) plays a vital role in cooperative multi-agent systems, with significant implications for applications such as logistics, search and rescue, and robotic coordination. Although traditional deep reinforcement learning (DRL) methods have been shown to be promising, their effectiveness is hindered by a reliance on manually designed reward functions and inefficiencies in dynamic environments. In this paper, an inverse reinforcement learning (IRL)-based framework is proposed, in which multi-head self-attention (MHSA) and graph attention mechanisms are incorporated to enhance reward function learning and task execution efficiency. Expert demonstrations are utilized to infer optimal reward densities, allowing dependence on handcrafted designs to be reduced and adaptability to be improved. Extensive experiments validate the superiority of the proposed method over widely used multi-agent reinforcement learning (MARL) algorithms in terms of both cumulative rewards and task execution efficiency.
