Table of Contents
Fetching ...

System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

Matteo Bettini, Ajay Shankar, Amanda Prorok

TL;DR

System Neural Diversity (SND) introduces a principled, closed-form metric for behavioral heterogeneity in multi-agent learning by computing pairwise policy distances with the Wasserstein metric across observations and aggregating them into a system-level diversity score via a distance matrix. SND is shown to satisfy key properties such as invariance to team size and explicit measurement of behavioral redundancy, addressing limitations of prior metrics like Hierarchic Social Entropy (HSE). Empirically, SND provides predictive insights into resilience and exploration in static and dynamic multi-robot tasks, revealing latent resilience and enabling diversity-controlled training to bootstrap faster convergence. The work demonstrates that guiding diversity with SND can enhance robustness and learning efficiency, offering a complementary tool to reward-based evaluation for designing and analyzing heterogeneous MARL systems.

Abstract

Evolutionary science provides evidence that diversity confers resilience in natural systems. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individuals may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this, there is a surprising lack of tools that quantify behavioral diversity. Such techniques would pave the way towards understanding the impact of diversity in collective artificial intelligence and enabling its control. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity in multi-agent systems. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in the robotics domain. Through simulations of a variety of cooperative multi-robot tasks, we show how our metric constitutes an important tool that enables measurement and control of behavioral heterogeneity. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that SND allows us to measure latent resilience skills acquired by the agents, while other proxies, such as task performance (reward), fail to. Finally, we show how the metric can be employed to control diversity, allowing us to enforce a desired heterogeneity set-point or range. We demonstrate how this paradigm can be used to bootstrap the exploration phase, finding optimal policies faster, thus enabling novel and more efficient MARL paradigms.

System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

TL;DR

System Neural Diversity (SND) introduces a principled, closed-form metric for behavioral heterogeneity in multi-agent learning by computing pairwise policy distances with the Wasserstein metric across observations and aggregating them into a system-level diversity score via a distance matrix. SND is shown to satisfy key properties such as invariance to team size and explicit measurement of behavioral redundancy, addressing limitations of prior metrics like Hierarchic Social Entropy (HSE). Empirically, SND provides predictive insights into resilience and exploration in static and dynamic multi-robot tasks, revealing latent resilience and enabling diversity-controlled training to bootstrap faster convergence. The work demonstrates that guiding diversity with SND can enhance robustness and learning efficiency, offering a complementary tool to reward-based evaluation for designing and analyzing heterogeneous MARL systems.

Abstract

Evolutionary science provides evidence that diversity confers resilience in natural systems. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individuals may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this, there is a surprising lack of tools that quantify behavioral diversity. Such techniques would pave the way towards understanding the impact of diversity in collective artificial intelligence and enabling its control. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity in multi-agent systems. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in the robotics domain. Through simulations of a variety of cooperative multi-robot tasks, we show how our metric constitutes an important tool that enables measurement and control of behavioral heterogeneity. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that SND allows us to measure latent resilience skills acquired by the agents, while other proxies, such as task performance (reward), fail to. Finally, we show how the metric can be employed to control diversity, allowing us to enforce a desired heterogeneity set-point or range. We demonstrate how this paradigm can be used to bootstrap the exploration phase, finding optimal policies faster, thus enabling novel and more efficient MARL paradigms.
Paper Structure (30 sections, 13 equations, 10 figures, 4 tables)

This paper contains 30 sections, 13 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Wasserstein metric $W_2(p,q)$ and KL divergence $D_{\mathrm{KL}}(p\;\|\;q)$ of two univariate distributions $p,q$ as their standard deviation approaches 0. The value of $W_2$ in this scenario approaches the absolute difference of their means while KL divergence approaches infinity.
  • Figure 2: Multi-Agent Goal Navigation example. Agents are spawned at random positions in a 2D workspace and take velocity actions (colored arrows) to reach their assigned goal, also spawned at random positions.
  • Figure 3: Behavioral distance matrices for the four experiments run on Multi-Agent Goal Navigation. We can observe how, when agents are assigned the same goal, they become behaviorally homogeneous and thus reduce their behavioral distance. We report mean and standard deviation for $d(i,j)$ over 5 random seeds for each experiment. The values are collected after 300 training iterations each performed over 600 episodes of experience.
  • Figure 4: An illustration of the properties of the proposed System Neural Diversity (SND) metric, contrasted against Heirarchical Social Entropy (HSE).
  • Figure 5: SND in the Multi-Agent Goal Navigation scenario. We can observe that, while all setups reach the same reward, SND decreases as the agents share more goals, until the system becomes homogeneous when all agents are sharing the same goal. We report mean and standard deviation over 5 random seeds for each experiment. The values are measured over 300 training iterations each performed over 600 episodes of experience.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1: Properties of the behavioral distance metric menger2003statistical