CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

Giovanni Minelli; Mirco Musolesi

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

Giovanni Minelli, Mirco Musolesi

TL;DR

CoMIX tackles the challenge of balancing coordinated team behavior with independent agent decision-making in decentralized MARL. It combines an Action Policy that multiplies a self-driven Q-value by a coordination weight learned from filtered neighbor messages, with a Coordinator that gates communications via a BiGRU-based masking mechanism. Training relies on centralized temporal-difference supervision through a QMIX-style mixer, augmented by a contrastive objective to optimize message filtering, enabling efficient and robust coordination under partial observability. Across Switch, Cooperative Load Transportation, and Predator-Prey, CoMIX achieves superior or competitive performance, reduces communication overhead, and demonstrates resilience to disrupted or noisy channels while preserving decentralized execution. This approach offers a scalable, interpretable path to emergent coordination in multi-agent systems with varying collaboration needs and communication conditions.

Abstract

Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental approach as effective technique for improving coordination in multi-agent systems.

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

TL;DR

Abstract

Paper Structure (33 sections, 8 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Coordination.
Information Sharing.
Information Representation.
Background
Deep Reinforcement Learning.
Centralized Training with Decentralized Execution.
The CoMIX Training Architecture
Architecture
Action Policy
Coordinator
Training
Centralized Temporal Difference Supervision
Contrastive Optimization
...and 18 more sections

Figures (4)

Figure 1: CoMIX training architecture. (a) The system is illustrated by breaking down its components with the information flow from left to right. Each agent observes a partial state of the world, processes it, and determines the next action to be taken. After transmitting this information $<s_i, \hat{a}_i>$ through the communication channel, they receive and filter the $\textbf{m}$ messages from other agents using a coordination module. The filtered messages $\bar{\textbf{m}}_i$ are then used to compute weights to rescale the original state-action pairs and select the best action. Decisions are made locally. (b) A conceptual demonstration of the interpretability
Figure 2: Environments used for experimental evaluation. From left to right (a) Switch; (b) Cooperative Load Transportation, 2 loads (red) and 4 agents (blue) with an equal distance of 15 steps to the final position (green), and 10% of randomly selected cells being obstacles (black); (c) Predator-Prey, 4 agents (blue), 16 prey (red) in a 12x12 map. The spatial observability of agents is shown in light blue.
Figure 3: Training results in the three environments used for evaluation. We adopt the following performance metrics: in Switch, the sum of rewards obtained by all agents normalized to 1; in Cooperative Load Transportation, the distance of the load from the docking area as the percentage of task completion; in Predator-Prey, the number of prey captured.
Figure 4: Average number of messages accepted by each agent's Coordinator normalized by the number of real agents acting in the environment. By introducing fictitious "noisy agents" transmitting random bits into the communication channel, we expect the value to remain unaltered.

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

TL;DR

Abstract

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

Authors

TL;DR

Abstract

Table of Contents

Figures (4)