Table of Contents
Fetching ...

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Michael Amir, Matteo Bettini, Amanda Prorok

TL;DR

The paper tackles when behavioral diversity yields higher rewards in cooperative multi-agent task allocation by formulating rewards as a double-aggregation and linking the advantage of heterogeneity to the curvature of inner and outer aggregators via Schur-convexity/concavity. It introduces HetGPS, a gradient-based environment design method that optimizes parameters in differentiable Dec-POMDPs to maximize the empirical heterogeneity gain, validated across matrix games and embodied MARL tasks. The main theoretical contribution provides convexity-based tests for ΔR>0 and identifies conditions under which diversity is beneficial, while HetGPS demonstrates practical recovery of the theoretically optimal reward instantiations. Together, these results offer a principled framework for designing and diagnosing when heterogeneity helps in cooperative multi-agent learning, with implications for reward shaping and environment co-design.

Abstract

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents' effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

TL;DR

The paper tackles when behavioral diversity yields higher rewards in cooperative multi-agent task allocation by formulating rewards as a double-aggregation and linking the advantage of heterogeneity to the curvature of inner and outer aggregators via Schur-convexity/concavity. It introduces HetGPS, a gradient-based environment design method that optimizes parameters in differentiable Dec-POMDPs to maximize the empirical heterogeneity gain, validated across matrix games and embodied MARL tasks. The main theoretical contribution provides convexity-based tests for ΔR>0 and identifies conditions under which diversity is beneficial, while HetGPS demonstrates practical recovery of the theoretically optimal reward instantiations. Together, these results offer a principled framework for designing and diagnosing when heterogeneity helps in cooperative multi-agent learning, with implications for reward shaping and environment co-design.

Abstract

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the agents' effort allocations on individual tasks to a task score, and an outer operator that merges the task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

Paper Structure

This paper contains 45 sections, 6 theorems, 29 equations, 10 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $N,M \ge 2$, and assume that (i) each task‐level aggregator$T_j$ is strictly Schur-convex and (ii) the outer aggregator$U$ is coordinate-wise strictly increasing. Then either all admissible optimal homogeneous allocations are trivial, or $\Delta R > 0$.

Figures (10)

  • Figure 1: We study and categorize what reward structures lead to the need for behavioral heterogeneity in multi-agent multi-task environments.
  • Figure 2: Left: Discrete ($\Delta R_{\mathrm D}$) and continuous-allocation ($\Delta R_{\mathrm F}$) heterogeneity gains for all $U,T\!\in\!\{\min,\text{mean},\max\}$. The indicator $\mathbf 1_{\{N\ge M\}}$ equals 1 if $N\ge M$ and 0 otherwise. Right: We plot the parametrized heterogeneity gains $\Delta R(t,\tau;N)$ when $U$ and $T$ are soft-max aggregators.
  • Figure 3: Heterogeneity gain for the discrete and continuous matrix games with $N=M=4$ over training iterations. We report mean and standard deviation after 12M frames over 9 random seeds. The final results match the theoretical predictions in the Table of \ref{['fig:deltaR-vs-softmax']}. Solid lines indicate reward structures predicted by theory to have $\Delta R > 0$ in either the discrete or continuous setting; dashed lines indicate predicted no gain in both settings.
  • Figure 4: Heterogeneity gain for Multi-goal-capture and 2v2 Tag throughout training. We report mean and standard deviation for 30 million training frames over 9 random seeds.
  • Figure 5: HetGPS results in Multi-goal-capture. The two leftmost columns report the evolution of aggregator parameters through training, while the rightmost column shows the obtained heterogeneity gain. This result empirically demonstrates that HetGPS rediscovers the reward structure predicted by our theory to maximize the gain, making the inner aggregator convex, and the outer aggregator concave. We report mean and standard deviation for 90M training frames over 13 random seeds.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Definition 3.1: Majorization
  • Definition 3.2: Schur-Convex Function
  • Theorem 3.1: Positive Heterogeneity Gain via Schur-convex Inner Aggregators
  • Theorem 3.2: No Heterogeneity Gain via Schur-concave Inner Aggregators
  • Theorem 3.3: No Heterogeneity Gain for Schur-Convex $U$ with Constant-Sum Task Scores
  • Theorem 3.4: Softmax heterogeneity gain for $N=M$
  • Definition F.1: Sum-Form Aggregator
  • Lemma F.1: Schur Properties of Sum-Form Aggregators peajcariaac1992convex
  • Corollary F.1: Convex-Concave Positive Heterogeneity Gain
  • proof : Proof of \ref{['prop:concavity_forces_multi_task']}
  • ...and 4 more