Table of Contents
Fetching ...

Nucleolus Credit Assignment for Effective Coalitions in Multi-agent Reinforcement Learning

Yugu Li, Zehong Cao, Jianglin Qiao, Siyi Hu

TL;DR

This work addresses credit assignment in cooperative MARL by enabling dynamic formation of multiple, small coalitions instead of a single grand coalition. It introduces a nucleolus-based framework that distributes rewards to agents to minimize the maximum coalition dissatisfaction, formalized via a Markov nucleolus and a nucleolus-based Bellman operator with a constrained RCPO-inspired optimization. The EC-POMDP formalism supports coalition formation among agents and environment entities, with theoretical guarantees for convergence and stability. Empirically, the approach accelerates learning and improves win rates on Predator-Prey and StarCraft benchmarks, while providing interpretable coalition structures and stability across tasks. The results suggest a promising direction for scalable, task-division strategies in complex MARL environments, with future work focusing on larger-scale deployment and computational efficiency.

Abstract

In cooperative multi-agent reinforcement learning (MARL), agents typically form a single grand coalition based on credit assignment to tackle a composite task, often resulting in suboptimal performance. This paper proposed a nucleolus-based credit assignment grounded in cooperative game theory, enabling the autonomous partitioning of agents into multiple small coalitions that can effectively identify and complete subtasks within a larger composite task. Specifically, our designed nucleolus Q-learning could assign fair credits to each agent, and the nucleolus Q-operator provides theoretical guarantees with interpretability for both learning convergence and the stability of the formed small coalitions. Through experiments on Predator-Prey and StarCraft scenarios across varying difficulty levels, our approach demonstrated the emergence of multiple effective coalitions during MARL training, leading to faster learning and superior performance in terms of win rate and cumulative rewards especially in hard and super-hard environments, compared to four baseline methods. Our nucleolus-based credit assignment showed the promise for complex composite tasks requiring effective subteams of agents.

Nucleolus Credit Assignment for Effective Coalitions in Multi-agent Reinforcement Learning

TL;DR

This work addresses credit assignment in cooperative MARL by enabling dynamic formation of multiple, small coalitions instead of a single grand coalition. It introduces a nucleolus-based framework that distributes rewards to agents to minimize the maximum coalition dissatisfaction, formalized via a Markov nucleolus and a nucleolus-based Bellman operator with a constrained RCPO-inspired optimization. The EC-POMDP formalism supports coalition formation among agents and environment entities, with theoretical guarantees for convergence and stability. Empirically, the approach accelerates learning and improves win rates on Predator-Prey and StarCraft benchmarks, while providing interpretable coalition structures and stability across tasks. The results suggest a promising direction for scalable, task-division strategies in complex MARL environments, with future work focusing on larger-scale deployment and computational efficiency.

Abstract

In cooperative multi-agent reinforcement learning (MARL), agents typically form a single grand coalition based on credit assignment to tackle a composite task, often resulting in suboptimal performance. This paper proposed a nucleolus-based credit assignment grounded in cooperative game theory, enabling the autonomous partitioning of agents into multiple small coalitions that can effectively identify and complete subtasks within a larger composite task. Specifically, our designed nucleolus Q-learning could assign fair credits to each agent, and the nucleolus Q-operator provides theoretical guarantees with interpretability for both learning convergence and the stability of the formed small coalitions. Through experiments on Predator-Prey and StarCraft scenarios across varying difficulty levels, our approach demonstrated the emergence of multiple effective coalitions during MARL training, leading to faster learning and superior performance in terms of win rate and cumulative rewards especially in hard and super-hard environments, compared to four baseline methods. Our nucleolus-based credit assignment showed the promise for complex composite tasks requiring effective subteams of agents.

Paper Structure

This paper contains 14 sections, 3 theorems, 22 equations, 4 figures, 1 algorithm.

Key Result

corollary 1

To assign the global Q-value $Q_{CS,global}(s,a)$ by given state $s$ and joint action $a$ under coalition structure $CS$, we modify the payoff distribution as: where $Q_{CS,i}(s,a_i)$ is the individual Q-value of agent $i$ and $\sum\limits_{i\in N} Q_{CS,i}(s,a_i)\\=Q_{CS, global}(s, a)$. Then, we model the Eq excess-mn to define the excess in coalition $C$ in Q-learning. where $V(s,a_C)$ is the

Figures (4)

  • Figure 1: The transition from a single grand coalition to multiple smaller, task-specific coalitions in MARL. In scenarios like SMAC in super-hard maps: where a large number of agents are involved, forming multiple small coalitions is crucial for task completion efficiently. Agents who attack the same enemy unit naturally form these coalitions, enabling them to work together efficiently to achieve the mission.
  • Figure 2: Learning performance in Predator-Prey: the turns to catch prey on the test episode
  • Figure 3: Learning performance in SMAC: median test win and rewards for easy task (a), hard (b-e) and super-hard (f) maps.
  • Figure 4: Visualize multiple coalition formation process in three tasks: 2s3z (easy), 3s vs 5z (hard) and corridor (super-hard)

Theorems & Definitions (4)

  • definition 1
  • corollary 1
  • theorem 1
  • theorem 2