CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness
Shoucheng Song, Youfang Lin, Sheng Han, Chang Yao, Hao Wu, Shuo Wang, Kai Lv
TL;DR
This work models multi-agent communication under asynchronous, delayed channels as a Delay-Tolerant Dec-POMDP, introducing CoDe to mitigate delays via intent and timeliness fusion. CoDe learns intents from future action inference using an encoder–predictor setup with losses that enforce future expressiveness ($L_{inf}$) and short-term stability ($L_c$), plus a diversity term ($L_k$), and fuse messages through dual alignment with attention and a temporal discount. The total objective combines RL loss with intent and alignment losses in $L_{tot} = L_{RL} + L_{int} + L_e$, and is validated on SMAC, GRF, and Hallway, demonstrating robustness to fixed and time-varying delays while outperforming baselines in delay-free settings. The findings suggest significant practical impact for robust MARL in real-world networks where communication delays are unavoidable, enabling more reliable collaboration and coordination. Note: The problem formulation and all mathematical notation are presented with inline math in $...$ delimiters where applicable, e.g., the DT-Dec-POMDP model and the loss terms shown here use $L_{inf}$, $L_c$, $L_k$, $L_{tot}$, and $L_{RL}$.
Abstract
Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.
