Table of Contents
Fetching ...

CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

Shoucheng Song, Youfang Lin, Sheng Han, Chang Yao, Hao Wu, Shuo Wang, Kai Lv

TL;DR

This work models multi-agent communication under asynchronous, delayed channels as a Delay-Tolerant Dec-POMDP, introducing CoDe to mitigate delays via intent and timeliness fusion. CoDe learns intents from future action inference using an encoder–predictor setup with losses that enforce future expressiveness ($L_{inf}$) and short-term stability ($L_c$), plus a diversity term ($L_k$), and fuse messages through dual alignment with attention and a temporal discount. The total objective combines RL loss with intent and alignment losses in $L_{tot} = L_{RL} + L_{int} + L_e$, and is validated on SMAC, GRF, and Hallway, demonstrating robustness to fixed and time-varying delays while outperforming baselines in delay-free settings. The findings suggest significant practical impact for robust MARL in real-world networks where communication delays are unavoidable, enabling more reliable collaboration and coordination. Note: The problem formulation and all mathematical notation are presented with inline math in $...$ delimiters where applicable, e.g., the DT-Dec-POMDP model and the loss terms shown here use $L_{inf}$, $L_c$, $L_k$, $L_{tot}$, and $L_{RL}$.

Abstract

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.

CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

TL;DR

This work models multi-agent communication under asynchronous, delayed channels as a Delay-Tolerant Dec-POMDP, introducing CoDe to mitigate delays via intent and timeliness fusion. CoDe learns intents from future action inference using an encoder–predictor setup with losses that enforce future expressiveness () and short-term stability (), plus a diversity term (), and fuse messages through dual alignment with attention and a temporal discount. The total objective combines RL loss with intent and alignment losses in , and is validated on SMAC, GRF, and Hallway, demonstrating robustness to fixed and time-varying delays while outperforming baselines in delay-free settings. The findings suggest significant practical impact for robust MARL in real-world networks where communication delays are unavoidable, enabling more reliable collaboration and coordination. Note: The problem formulation and all mathematical notation are presented with inline math in delimiters where applicable, e.g., the DT-Dec-POMDP model and the loss terms shown here use , , , , and .

Abstract

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.
Paper Structure (27 sections, 9 equations, 6 figures, 3 tables)

This paper contains 27 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Performance of Communication-Enabled MARL Algorithm in Delayed Environments. Channel delays are configured as four types: 0, 5, infinity, and values sampled from a Gaussian distribution $\mathcal{N} \sim (5,2)$. Notably, the unit of delay is the decision time interval.
  • Figure 2: Overall Framework of CoDe. (a) The training framework; (b) The communication module consisting of intent extraction, message propagation, and message fusion; (c) The intent learning module via two designed losses; (d) The dual alignment module via a modified attention structure, in which $"-i"$ identifies the index of agents other than $i$ and $"?<t"$ represents certain timestamp before $t$; (e) The sequence prediction model used to decode the intent to future actions.
  • Figure 3: Algorithms Performance in SMAC under Zero Communication Delay. Each curve represents the average result of 5 random seeds. The last three are our proposed new maps.
  • Figure 4: Performance in GRF and Hallway under Zero Communication Delay. The first line represents the result in GRF, while the second line represents the result in Hallway.
  • Figure 5: Visualization of the learned intent. "L" and "R" represent the left and right actions, respectively. $\hat{a}^{t+?}$ refers to the expected action at a future time, as decoded from the intent at time $t$.
  • ...and 1 more figures