Table of Contents
Fetching ...

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

Aayam Bansal, Ishaan Gangwani

Abstract

Cooperative multi-agent methods for embodied AI are almost universally evaluated under idealized communication: zero latency, no packet loss, and unlimited bandwidth. Real-world deployment on robots with wireless links, autonomous vehicles on congested networks, or drone swarms in contested spectrum offers no such guarantees. We introduce AgentComm-Bench, a benchmark suite and evaluation protocol that systematically stress-tests cooperative embodied AI under six communication impairment dimensions: latency, packet loss, bandwidth collapse, asynchronous updates, stale memory, and conflicting sensor evidence. AgentComm-Bench spans three task families: cooperative perception, multi-agent waypoint navigation, and cooperative zone search, and evaluates five communication strategies, including a lightweight method we propose based on redundant message coding with staleness-aware fusion. Our experiments reveal that communication-dependent tasks degrade catastrophically: stale memory and bandwidth collapse cause over 96% performance drops in navigation, while content corruption (stale or conflicting data) reduces perception F1 by over 85%. Vulnerability depends on the interaction between impairment type and task design; perception fusion is robust to packet loss but amplifies corrupted data. Redundant message coding more than doubles navigation performance under 80% packet loss. We release AgentComm-Bench as a practical evaluation protocol and recommend that cooperative embodied AI work report performance under multiple impairment conditions.

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

Abstract

Cooperative multi-agent methods for embodied AI are almost universally evaluated under idealized communication: zero latency, no packet loss, and unlimited bandwidth. Real-world deployment on robots with wireless links, autonomous vehicles on congested networks, or drone swarms in contested spectrum offers no such guarantees. We introduce AgentComm-Bench, a benchmark suite and evaluation protocol that systematically stress-tests cooperative embodied AI under six communication impairment dimensions: latency, packet loss, bandwidth collapse, asynchronous updates, stale memory, and conflicting sensor evidence. AgentComm-Bench spans three task families: cooperative perception, multi-agent waypoint navigation, and cooperative zone search, and evaluates five communication strategies, including a lightweight method we propose based on redundant message coding with staleness-aware fusion. Our experiments reveal that communication-dependent tasks degrade catastrophically: stale memory and bandwidth collapse cause over 96% performance drops in navigation, while content corruption (stale or conflicting data) reduces perception F1 by over 85%. Vulnerability depends on the interaction between impairment type and task design; perception fusion is robust to packet loss but amplifies corrupted data. Redundant message coding more than doubles navigation performance under 80% packet loss. We release AgentComm-Bench as a practical evaluation protocol and recommend that cooperative embodied AI work report performance under multiple impairment conditions.
Paper Structure (57 sections, 2 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 57 sections, 2 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of AgentComm-Bench. The benchmark evaluates cooperative methods across three task families under six communication impairment dimensions, producing robustness profiles via standardized metrics.
  • Figure 2: Communication corruption pipeline. Each message passes through the configurable impairment channel before reaching the receiver. The six dimensions are applied independently by default.
  • Figure 3: ResilientComm architecture. Redundant message coding sends two copies; staleness-aware fusion weights received messages by estimated age. When both copies are lost, the agent falls back to its most recent received state.
  • Figure 4: Robustness curves across all task--impairment combinations. Each subplot shows performance (mean $\pm$ std over 30 episodes) as communication impairment severity increases. Navigation shows the most dramatic degradation, with all six impairments causing monotonic performance drops from near-perfect coordination to near-random-walk levels. Shaded regions indicate $\pm1$ standard deviation.
  • Figure 5: Normalized Performance Drop (%) at maximum impairment severity. Each cell shows the percentage of clean performance lost. Navigation dominates: stale memory and bandwidth collapse cause $>$96% NPD, effectively reverting coordinated agents to random walks. CP shows extreme sensitivity only to content corruption (stale/conflict $>$85%).
  • ...and 2 more figures