Table of Contents
Fetching ...

The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes

Nikolaos Al. Papadopoulos, Konstantinos Psannis

TL;DR

It is demonstrated, in this setting, that high aggregate payoffs can coexist with poor temporal coordination, and that conventional metrics may severely mischaracterize emergent dynamics.

Abstract

Multi-agent coordination dilemmas expose a fundamental tension between individual optimization and collective welfare, yet characterizing such coordination requires metrics sensitive to temporal structure and collective dynamics. As a diagnostic testbed, we study a BoE-derived multi-agent variant of the Battle of the Exes, formalizing it as a Markov game in which turn-taking emerges as a periodic coordination regime. Conventional outcome-based metrics (e.g., efficiency and min/max fairness) are temporally blind -- they cannot distinguish structured alternation from monopolistic or random access patterns -- and fairness ratios lose discriminative power as n grows, obscuring inequities. To address this limitation, we introduce Perfect Alternation (PA) as a reference coordination regime and propose six novel Alternation (ALT) metrics designed as temporally sensitive observables of coordination quality. Using Q-learning agents as a minimal adaptive diagnostic baseline, and comparing against random-policy null processes, we uncover a clear measurement failure: despite exhibiting deceptively high traditional metrics (e.g., reward fairness often exceeding 0.9), learned policies perform up to 81% below random baselines under ALT-variant evaluation -- a deficit already present in the two-agent case and intensifying as n grows. These results demonstrate, in this setting, that high aggregate payoffs can coexist with poor temporal coordination, and that conventional metrics may severely mischaracterize emergent dynamics. Our findings underscore the necessity of temporally aware observables for analyzing coordination in multi-agent games and highlight random-policy baselines as essential null processes for interpreting coordination outcomes relative to chance-level behavior.

The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes

TL;DR

It is demonstrated, in this setting, that high aggregate payoffs can coexist with poor temporal coordination, and that conventional metrics may severely mischaracterize emergent dynamics.

Abstract

Multi-agent coordination dilemmas expose a fundamental tension between individual optimization and collective welfare, yet characterizing such coordination requires metrics sensitive to temporal structure and collective dynamics. As a diagnostic testbed, we study a BoE-derived multi-agent variant of the Battle of the Exes, formalizing it as a Markov game in which turn-taking emerges as a periodic coordination regime. Conventional outcome-based metrics (e.g., efficiency and min/max fairness) are temporally blind -- they cannot distinguish structured alternation from monopolistic or random access patterns -- and fairness ratios lose discriminative power as n grows, obscuring inequities. To address this limitation, we introduce Perfect Alternation (PA) as a reference coordination regime and propose six novel Alternation (ALT) metrics designed as temporally sensitive observables of coordination quality. Using Q-learning agents as a minimal adaptive diagnostic baseline, and comparing against random-policy null processes, we uncover a clear measurement failure: despite exhibiting deceptively high traditional metrics (e.g., reward fairness often exceeding 0.9), learned policies perform up to 81% below random baselines under ALT-variant evaluation -- a deficit already present in the two-agent case and intensifying as n grows. These results demonstrate, in this setting, that high aggregate payoffs can coexist with poor temporal coordination, and that conventional metrics may severely mischaracterize emergent dynamics. Our findings underscore the necessity of temporally aware observables for analyzing coordination in multi-agent games and highlight random-policy baselines as essential null processes for interpreting coordination outcomes relative to chance-level behavior.
Paper Structure (50 sections, 16 equations, 5 figures, 5 tables)

This paper contains 50 sections, 16 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Q-Learning vs Random Baseline CALT Performance. Blue bars show Q-learning agents consistently achieving lower CALT values than random baselines (orange bars) across all agent configurations (2, 3, 5, 8, 10 agents). All coordination scores are negative, indicating Q-learning performs worse than chance. Error bars represent standard deviation across Type-A/Type-B and ILF/IQF configurations.
  • Figure 2: Perfect Alternation Equivalent Performance Degradation. Q-learning agents (blue line; mean across 4 modes) decline from $\sim$56.4% (2 agents) to $\sim$17.9% (10 agents) of perfect coordination, while random baselines (orange line) decline from 69.7% to 33.3%. Shaded regions indicate 95% confidence intervals. Q-learning remains below random across agent counts, indicating persistent coordination deficits.
  • Figure 3: Traditional Metrics Fail to Detect Coordination Failure. Left panels show traditional outcome metrics achieving moderate-to-high values across configurations, suggesting successful coordination. Right panels show alternation-sensitive metrics exposing poor coordination, with all values below random baselines (dashed lines). This dichotomy demonstrates that traditional outcome-based fairness measures cannot distinguish coordinated alternation from monopolistic or random resource access.
  • Figure 4: Symmetric Coordination Failure in 3-Agent Systems. Despite nearly identical high traditional metrics (Reward Fairness $>$ 0.90, Efficiency $>$ 0.34), both Type-A ILF and Type-B IQF configurations exhibit similarly low ALT performance (CALT $\approx$ 0.14). This symmetric failure across state representations and reward structures demonstrates that coordination deficit is fundamental to independent Q-learning, not an artifact of specific design choices.
  • Figure 5: CALT Progression During Q-Learning Training. CALT values (solid lines) decrease during learning as epsilon decays (dashed line, right axis), indicating that learned policies achieve worse coordination than early exploration phases. Traditional efficiency (dotted lines) follows a different trend over training and does not track CALT. This finding suggests that convergence to deterministic policies can interfere with alternation while traditional metrics may suggest improvement.