Table of Contents
Fetching ...

The impact of behavioral diversity in multi-agent reinforcement learning

Matteo Bettini, Ryan Kortvelesy, Amanda Prorok

TL;DR

The paper investigates how behavioral diversity influences learning and performance in multi-agent reinforcement learning (MARL). It introduces System Neural Diversity (SND) to measure behavioral heterogeneity and Diversity Control (DiCo) to enforce a target diversity during training, enabling principled studies of heterogeneous teams. Across 2v2 and 5v5 soccer tasks, Pac-Men exploration, and Dynamic Passage resilience, constrained diversity yields emergent complementary roles (such as passing strategies and goalkeeping), improves coordination, accelerates exploration, and enhances resilience to disruptions. The findings suggest that diversity is a fundamental component of collective artificial learning, with tangible benefits over homogeneous training and potential implications for real-world multi-agent systems and lifelong learning.

Abstract

Many of the world's most pressing issues, such as climate change and global peace, require complex collective problem-solving skills. Recent studies indicate that diversity in individuals' behaviors is key to developing such skills and increasing collective performance. Yet behavioral diversity in collective artificial learning is understudied, with today's machine learning paradigms commonly favoring homogeneous agent strategies over heterogeneous ones, mainly due to computational considerations. In this work, we employ diversity measurement and control paradigms to study the impact of behavioral heterogeneity in several facets of multi-agent reinforcement learning. Through experiments in team play and other cooperative tasks, we show the emergence of unbiased behavioral roles that improve team outcomes; how behavioral diversity synergizes with morphological diversity; how diverse agents are more effective at finding cooperative solutions in sparse reward settings; and how behaviorally heterogeneous teams learn and retain latent skills to overcome repeated disruptions. Overall, our results indicate that, by controlling diversity, we can obtain non-trivial benefits over homogeneous training paradigms, demonstrating that diversity is a fundamental component of collective artificial learning, an insight thus far overlooked.

The impact of behavioral diversity in multi-agent reinforcement learning

TL;DR

The paper investigates how behavioral diversity influences learning and performance in multi-agent reinforcement learning (MARL). It introduces System Neural Diversity (SND) to measure behavioral heterogeneity and Diversity Control (DiCo) to enforce a target diversity during training, enabling principled studies of heterogeneous teams. Across 2v2 and 5v5 soccer tasks, Pac-Men exploration, and Dynamic Passage resilience, constrained diversity yields emergent complementary roles (such as passing strategies and goalkeeping), improves coordination, accelerates exploration, and enhances resilience to disruptions. The findings suggest that diversity is a fundamental component of collective artificial learning, with tangible benefits over homogeneous training and potential implications for real-world multi-agent systems and lifelong learning.

Abstract

Many of the world's most pressing issues, such as climate change and global peace, require complex collective problem-solving skills. Recent studies indicate that diversity in individuals' behaviors is key to developing such skills and increasing collective performance. Yet behavioral diversity in collective artificial learning is understudied, with today's machine learning paradigms commonly favoring homogeneous agent strategies over heterogeneous ones, mainly due to computational considerations. In this work, we employ diversity measurement and control paradigms to study the impact of behavioral heterogeneity in several facets of multi-agent reinforcement learning. Through experiments in team play and other cooperative tasks, we show the emergence of unbiased behavioral roles that improve team outcomes; how behavioral diversity synergizes with morphological diversity; how diverse agents are more effective at finding cooperative solutions in sparse reward settings; and how behaviorally heterogeneous teams learn and retain latent skills to overcome repeated disruptions. Overall, our results indicate that, by controlling diversity, we can obtain non-trivial benefits over homogeneous training paradigms, demonstrating that diversity is a fundamental component of collective artificial learning, an insight thus far overlooked.

Paper Structure

This paper contains 46 sections, 8 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Overview on measuring and controlling diversity.a, Diversity ($\mathrm{SND}$) is computed as the average Wasserstein distance between agents' action distributions over all agent pairs and all observations in the evaluation set. b, Diversity is controlled by representing policies as the sum of a shared (homogeneous) component and individual (heterogeneous) components, which are dynamically scaled according the the current ($\widehat{\mathrm{SND}}$) and desired ($\mathrm{SND}_\mathrm{des}$) value of the diversity metric.
  • Figure 2: Soccer results in the two vs. two setting.a, Setup and details of the Soccer scenario. b, Curriculum used for opponents' strength annealing in the two experiment setups considered. c,d, Reward (throughout training, with $1$ obtained when agents score in $100\%$ of the matches) and diversity ($\mathrm{SND}$, after training) for non-kicking (c) and kicking (d) agents. We report results for agents constrained at different diversity levels (including homogeneous, $\mathrm{SND}_\mathrm{des}=0$) and unconstrained heterogeneous agents. The results show that agents constrained to a high diversity obtain the best performance, while unconstrained agents tend to become too diverse, not leveraging any homogeneity. e,f, Renderings from non-kicking (e) and kicking (f) heterogeneous agents, showing the emergence of diverse strategies that resemble a crossing pass (e) and a through pass (f). Homogeneous agents are not able to learn such strategies and often converge to collectively blindly chase after the ball (a known suboptimal policy). All experiments are run for 3 random seeds. Each datapoint in the reward curves is computed over 480 matches ($\times 3$ seeds). Each match lasts max 500 steps or until a team scores. Reward curves report mean and standard deviation, while bar charts report mean with error bars representing the 25 and 75 percentiles.
  • Figure 3: Soccer results in the five vs. five setting.a, Task setup. b, Policy model used in the experiments. Deep Sets is used to grant permutation invariance over opponents' data while a Graph Neural Network (GNN) enables permutation equivariance and communication among agents. These models homogeneously compute an agent-specific context which is the input of the Diversity Control (DiCo) paradigm. c,d, Training results reporting reward and diversity ($\mathrm{SND}$) for non-kicking (c) and kicking (d) agents. We report results for agents constrained at different diversity levels (including homogeneous, $\mathrm{SND}_\mathrm{des}=0$) and unconstrained heterogeneous agents. The results show that controlling diversity grants performance improvements over homogeneous and unconstrained heterogeneous paradigms, with the latter diverging to extreme diversity and failing to solve the task. e,g, Renderings from non-kicking (e) and kicking (g) heterogeneous agents, showing the emergence of the goalkeeper role. The emergent goalkeepers exhibit preference to position themselves inside the goal independently of the context (a strategy that homogeneous agents are not able to learn) and perform multiple saves when opponents break the defense line. The emergence of this role is quantitatively shown in f,h respectively, that report pairwise agent behavioral distance $d(i,j)$ after training, showing that the goalkeepers are significantly different from all other agents. The reward and $\mathrm{SND}$ values are computed over 480 matches ($\times 3$ training seeds) using models trained for 150 million frames. Each match lasts max 500 steps or until a team scores. Bar charts report mean with error bars representing the 25 and 75 percentiles.
  • Figure 4: Results in the heterogeneous vs homogeneous Soccer match (5v5). The best trained models for heterogeneous and homogeneous agents in the five vs. five setting are evaluated in a competition against each other over 10,000 matches. a, Global statistics over the matches. b, Team-specific statistics over the matches. c, Pitch heatmap displaying the frequency of ball positions over 1,000 matches. The results show the higher performance of heterogeneous agents, able to win over half the matches played, with higher attack and ball-handling statistics. The heatmap corroborates these results by showing a major ball presence in the homogeneous agents' defensive half-pitch with a spatially-spread attack pattern for heterogeneous agents. On the other hand, homogeneous agents perform less frequent and more localized offensive actions. Statistics report mean and standard error over 10,000 matches. Ball possession is computed by considering the team membership of the closest agent to the ball.
  • Figure 5: Results in the physically-different 5v5 Soccer experiment.a, Task setup with the three types of embodiment used: Goalie (big and slow), Defender (average speed and size), and Attacker (small and fast). b, Curriculum used for opponents' strength annealing throughout training. c, Reward (throughout training, with $1$ obtained when agents score in $100\%$ of the matches) and diversity ($\mathrm{SND}$, after training). We report results for agents constrained at different diversity levels $\mathrm{SND}_\mathrm{des}$. The results show that a higher diversity target leads to better performance, allowing the agents to leverage their physical differences at the behavioral level. d, Learned behavioral differences (from Goalie to other roles) as a function of fixed physical differences, showing that heterogeneous agents learn diverse behavioral roles proportional to their physical differences. Behavioral differences are evaluated over 2.5 million observations for the $\mathrm{SND}_\mathrm{des}=0.2$ model. e,g, Heterogeneous (e) and homogeneous (g) agents, trained with the starting 1-2-2 formation, are evaluated starting in a random order with a line formation. Diverse agents are unaffected by this shift and are able to keep leveraging their physical differences at the behavioral level, while homogeneous agents, now not able to condition on the starting position to infer the role, exhibit the same strategy for all agents. The behavioral distance matrices (f,h) further show how heterogeneous agents are the only ones to learn behavioral differences that are proportional to the physical differences: with Goalie and Attacker being the furthest apart and agents with the same embodiment being behaviorally similar. Each datapoint in the reward curves and bar charts is computed over 480 matches ($\times 3$ training seeds). Each match lasts max 500 steps or until a team scores. Reward curves report mean and standard deviation, while bar charts report mean with error bars representing the 25 and 75 percentiles.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 1: Wasserstein metric for multivariate normal distributions