CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making
Giovanni Minelli, Mirco Musolesi
TL;DR
CoMIX tackles the challenge of balancing coordinated team behavior with independent agent decision-making in decentralized MARL. It combines an Action Policy that multiplies a self-driven Q-value by a coordination weight learned from filtered neighbor messages, with a Coordinator that gates communications via a BiGRU-based masking mechanism. Training relies on centralized temporal-difference supervision through a QMIX-style mixer, augmented by a contrastive objective to optimize message filtering, enabling efficient and robust coordination under partial observability. Across Switch, Cooperative Load Transportation, and Predator-Prey, CoMIX achieves superior or competitive performance, reduces communication overhead, and demonstrates resilience to disrupted or noisy channels while preserving decentralized execution. This approach offers a scalable, interpretable path to emergent coordination in multi-agent systems with varying collaboration needs and communication conditions.
Abstract
Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental approach as effective technique for improving coordination in multi-agent systems.
