VIL2C: Value-of-Information Aware Low-Latency Communication for Multi-Agent Reinforcement Learning
Qian Zhang, Zhuo Sun, Yao Zhang, Zhiwen Yu, Bin Guo, Jun Zhang
TL;DR
VIL2C tackles the latency challenge in cooperative MARL by introducing a Value-of-Information (VoI) metric that quantifies how much a delayed message influences an agent's decision relative to its transmission delay. It then jointly optimizes transmission (via ResoNet) and reception (via a progressive, entropy-driven stopping rule) to prioritize high-VoI messages and reduce waiting times. The approach is integrated with MAPPO in a CTDE framework, and theoretical analysis provides a lower bound on performance, with empirical results on Predator-Prey, Cooperative Navigation, and SMACv2 showing substantial gains over baselines under various channel conditions. The work demonstrates that VoI-aware resource allocation and adaptive reception can significantly mitigate latency penalties in MARL, with practical implications for time-critical multi-agent systems, while outlining directions for broader latency models and VoI definitions.
Abstract
Inter-agent communication serves as an effective mechanism for enhancing performance in collaborative multi-agent reinforcement learning(MARL) systems. However, the inherent communication latency in practical systems induces both action decision delays and outdated information sharing, impeding MARL performance gains, particularly in time-critical applications like autonomous driving. In this work, we propose a Value-of-Information aware Low-latency Communication(VIL2C) scheme that proactively adjusts the latency distribution to mitigate its effects in MARL systems. Specifically, we define a Value of Information (VOI) metric to quantify the importance of delayed message transmission based on each delayed message's importance. Moreover, we propose a progressive message reception mechanism to adaptively adjust the reception duration based on received messages. We derive the optimized VoI aware resource allocation and theoretically prove the performance advantage of the proposed VIL2C scheme. Extensive experiments demonstrate that VIL2C outperforms existing approaches under various communication conditions. These gains are attributed to the low-latency transmission of high-VoI messages via resource allocation and the elimination of unnecessary waiting periods via adaptive reception duration.
