Table of Contents
Fetching ...

Structured Diversity Control: A Dual-Level Framework for Group-Aware Multi-Agent Coordination

Shuocun Yang, Huawen Hu, Xuan Liu, Yincheng Yao, Enze Shi, Shu Zhang

TL;DR

Structured Diversity Control (SDC) is introduced, a framework that redefines the system-wide diversity metric as a weighted combination of intra-group diversity, which is minimized for cohesion and inter-group diversity, which is maximized for specialization.

Abstract

Controlling the behavioral diversity is a pivotal challenge in multi-agent reinforcement learning (MARL), particularly in complex collaborative scenarios. While existing methods attempt to regulate behavioral diversity by directly differentiating across all agents, they lack deep characterization and learning of multi-agent composition structures. This limitation leads to suboptimal performance or coordination failures when facing more complex or challenging tasks. To bridge this gap, we introduce Structured Diversity Control (SDC), a framework that redefines the system-wide diversity metric as a weighted combination of intra-group diversity, which is minimized for cohesion and inter-group diversity, which is maximized for specialization. The trade-off is governed by a pre-set Diversity Structure Factor (DSF), allowing for fine-grained, group-aware control over the collective strategy. Our method directly constrains the policy architecture without altering reward functions. This structural definition of diversity enables SDC to deliver substantial performance gains across various experiments, including increasing average rewards by up to 47.1\% in multi-target pursuit and reducing episode lengths by 12.82\% in complex neutralization scenarios. The proposed method offers a novel analytical perspective on the problem of cooperation in group-aware multi-agent systems.

Structured Diversity Control: A Dual-Level Framework for Group-Aware Multi-Agent Coordination

TL;DR

Structured Diversity Control (SDC) is introduced, a framework that redefines the system-wide diversity metric as a weighted combination of intra-group diversity, which is minimized for cohesion and inter-group diversity, which is maximized for specialization.

Abstract

Controlling the behavioral diversity is a pivotal challenge in multi-agent reinforcement learning (MARL), particularly in complex collaborative scenarios. While existing methods attempt to regulate behavioral diversity by directly differentiating across all agents, they lack deep characterization and learning of multi-agent composition structures. This limitation leads to suboptimal performance or coordination failures when facing more complex or challenging tasks. To bridge this gap, we introduce Structured Diversity Control (SDC), a framework that redefines the system-wide diversity metric as a weighted combination of intra-group diversity, which is minimized for cohesion and inter-group diversity, which is maximized for specialization. The trade-off is governed by a pre-set Diversity Structure Factor (DSF), allowing for fine-grained, group-aware control over the collective strategy. Our method directly constrains the policy architecture without altering reward functions. This structural definition of diversity enables SDC to deliver substantial performance gains across various experiments, including increasing average rewards by up to 47.1\% in multi-target pursuit and reducing episode lengths by 12.82\% in complex neutralization scenarios. The proposed method offers a novel analytical perspective on the problem of cooperation in group-aware multi-agent systems.

Paper Structure

This paper contains 23 sections, 5 equations, 10 figures.

Figures (10)

  • Figure 1: Overview of the Structured Diversity Control (SDC) framework within a MARL training loop: (a) Structured Diversity Control: Within the SDC framework, a structured diversity constraint is imposed on the agent policies. (b) Structured Diversity: At each training step, the framework first computes the structured system diversity $\text{SND}_{\text{struct}}$ from the heterogeneous policy components. (c) Policy Factorization: Each agent's final policy is represented as the sum of a parameter-shared, homogeneous component and a rescaled, per-agent heterogeneous component. The computed $\text{SND}_{\text{struct}}$ is then compared to a desired target value $\text{SND}_{\text{des}}$ to generate a scale factor. This factor is used to rescale the heterogeneous components, resulting in the final updated policies.
  • Figure 2: Verification of SDC's diversity control capability. Each colored line represents a separate training run with a different target diversity level, $\text{SND}_{\text{des}}$, ranging from 0.3 to 0.9, at a fixed DSF of $\alpha=0.9$.
  • Figure 3: Analysis of the Diversity Structure Factor (DSF, $\alpha$). (Left) The effect of $\alpha$ on the inter-group diversity of the heterogeneous components. Higher $\alpha$ values, which place more weight on inter-group diversity, lead to a correspondingly higher and more sustained level of specialization between groups. (Right) Verification of total diversity control under different DSF settings. For a fixed target diversity of $\text{SND}_{\text{des}}=0.3$, the measured system diversity $\text{SND}(\{\pi_i\})$ remains stable and accurate across all tested $\alpha$ values, demonstrating the robustness of our control mechanism.
  • Figure 4: Performance and diversity comparison in the Shielded Tag environment (7vs2 scenario) with a target diversity of $\text{SND}_{\text{des}}=0.3$. (Left) Mean episode length over training. SDC (red) learns a superior strategy, completing the task significantly faster than DiCo (blue). (Right) Measured total system diversity, $\text{SND}(\{\pi_i\})$. Both methods successfully converge to and maintain the target diversity level of 0.3.
  • Figure 5: Comparison of average rewards between DiCo and SDC across scenarios with a target diversity of $\text{SND}_{\text{des}}=0.3$. (a) Comparison of Average Rewards for DiCo and SDC in “5 chasing 2" scenario. (b) Comparison of Average Rewards for DiCo and SDC in “6 chasing 2" scenario. (c) Comparison of Average Rewards for DiCo and SDC in “7 chasing 2" scenario.
  • ...and 5 more figures