Table of Contents
Fetching ...

VIL2C: Value-of-Information Aware Low-Latency Communication for Multi-Agent Reinforcement Learning

Qian Zhang, Zhuo Sun, Yao Zhang, Zhiwen Yu, Bin Guo, Jun Zhang

TL;DR

VIL2C tackles the latency challenge in cooperative MARL by introducing a Value-of-Information (VoI) metric that quantifies how much a delayed message influences an agent's decision relative to its transmission delay. It then jointly optimizes transmission (via ResoNet) and reception (via a progressive, entropy-driven stopping rule) to prioritize high-VoI messages and reduce waiting times. The approach is integrated with MAPPO in a CTDE framework, and theoretical analysis provides a lower bound on performance, with empirical results on Predator-Prey, Cooperative Navigation, and SMACv2 showing substantial gains over baselines under various channel conditions. The work demonstrates that VoI-aware resource allocation and adaptive reception can significantly mitigate latency penalties in MARL, with practical implications for time-critical multi-agent systems, while outlining directions for broader latency models and VoI definitions.

Abstract

Inter-agent communication serves as an effective mechanism for enhancing performance in collaborative multi-agent reinforcement learning(MARL) systems. However, the inherent communication latency in practical systems induces both action decision delays and outdated information sharing, impeding MARL performance gains, particularly in time-critical applications like autonomous driving. In this work, we propose a Value-of-Information aware Low-latency Communication(VIL2C) scheme that proactively adjusts the latency distribution to mitigate its effects in MARL systems. Specifically, we define a Value of Information (VOI) metric to quantify the importance of delayed message transmission based on each delayed message's importance. Moreover, we propose a progressive message reception mechanism to adaptively adjust the reception duration based on received messages. We derive the optimized VoI aware resource allocation and theoretically prove the performance advantage of the proposed VIL2C scheme. Extensive experiments demonstrate that VIL2C outperforms existing approaches under various communication conditions. These gains are attributed to the low-latency transmission of high-VoI messages via resource allocation and the elimination of unnecessary waiting periods via adaptive reception duration.

VIL2C: Value-of-Information Aware Low-Latency Communication for Multi-Agent Reinforcement Learning

TL;DR

VIL2C tackles the latency challenge in cooperative MARL by introducing a Value-of-Information (VoI) metric that quantifies how much a delayed message influences an agent's decision relative to its transmission delay. It then jointly optimizes transmission (via ResoNet) and reception (via a progressive, entropy-driven stopping rule) to prioritize high-VoI messages and reduce waiting times. The approach is integrated with MAPPO in a CTDE framework, and theoretical analysis provides a lower bound on performance, with empirical results on Predator-Prey, Cooperative Navigation, and SMACv2 showing substantial gains over baselines under various channel conditions. The work demonstrates that VoI-aware resource allocation and adaptive reception can significantly mitigate latency penalties in MARL, with practical implications for time-critical multi-agent systems, while outlining directions for broader latency models and VoI definitions.

Abstract

Inter-agent communication serves as an effective mechanism for enhancing performance in collaborative multi-agent reinforcement learning(MARL) systems. However, the inherent communication latency in practical systems induces both action decision delays and outdated information sharing, impeding MARL performance gains, particularly in time-critical applications like autonomous driving. In this work, we propose a Value-of-Information aware Low-latency Communication(VIL2C) scheme that proactively adjusts the latency distribution to mitigate its effects in MARL systems. Specifically, we define a Value of Information (VOI) metric to quantify the importance of delayed message transmission based on each delayed message's importance. Moreover, we propose a progressive message reception mechanism to adaptively adjust the reception duration based on received messages. We derive the optimized VoI aware resource allocation and theoretically prove the performance advantage of the proposed VIL2C scheme. Extensive experiments demonstrate that VIL2C outperforms existing approaches under various communication conditions. These gains are attributed to the low-latency transmission of high-VoI messages via resource allocation and the elimination of unnecessary waiting periods via adaptive reception duration.

Paper Structure

This paper contains 16 sections, 3 theorems, 12 equations, 7 figures, 1 table.

Key Result

Proposition 1

Consider the importance $\xi_{i,j}$ of agent $i$'s message on recipient $j$. The optimized bandwidth $B_{i,j}^*$ and transmit power $P_{i,j}^*$ allocated by agent $i$ to agent $j$ satisfy respectively, and $B_{i,j}^*$ and $P_{i,j}^*$ are proportional to $\xi_{i,j}$. Here, $\gamma_{i,j} = \frac{P_{i,j}^*}{10^{\frac{PL_{i,j}}{10}}B_{i,j}^* N_0 }$, $\tau_{i,j}$ is communication latency from agent $i

Figures (7)

  • Figure 1: The toy example illustration of VIL2C. Agent $i$ transmits messages to agents $j$ and $n$ to collaboratively capture target $P$. Based on the message importance relative to recipients $j$ and $n$, agent $i$ allocates transmission resources (represented by arrow thickness) to proactively adjust latency. Agent $j$ terminates reception (represented by red dot), once receiving sufficient messages. Black arrows represent communication links and gray arrows represent movement.
  • Figure 2: The framework of VIL2C. Each agent consists of six components: 1) an Encoder that generates the message from the local observation, 2) a ResoNet that performs the online resource allocation, 3) a Message Buffer that stores received messages, 4) an Actor that utilizes the own message and received messages to obtain an action probability distribution, 5) a Progressive Reception module that determines whether to stop receiving messages, illustrated by the yellow box, and the corresponding process is represented by yellow lines, 6) a Sample module to select actions from the action probability distribution.
  • Figure 3: Learning curves for Predator Prey (PP)
  • Figure 4: Learning curves for Cooperative Navigation (CN)
  • Figure 5: Win Rates for SMACv2
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Proposition 3