When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Michael Amir; Matteo Bettini; Amanda Prorok

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Michael Amir, Matteo Bettini, Amanda Prorok

TL;DR

The paper tackles when behavioral diversity yields higher rewards in cooperative multi-agent task allocation by formulating rewards as a double-aggregation and linking the advantage of heterogeneity to the curvature of inner and outer aggregators via Schur-convexity/concavity. It introduces HetGPS, a gradient-based environment design method that optimizes parameters in differentiable Dec-POMDPs to maximize the empirical heterogeneity gain, validated across matrix games and embodied MARL tasks. The main theoretical contribution provides convexity-based tests for ΔR>0 and identifies conditions under which diversity is beneficial, while HetGPS demonstrates practical recovery of the theoretically optimal reward instantiations. Together, these results offer a principled framework for designing and diagnosing when heterogeneity helps in cooperative multi-agent learning, with implications for reward shaping and environment co-design.

Abstract

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents' effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

TL;DR

Abstract

agents' effort allocations on individual tasks to a task score, and an outer operator that merges the

task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

TL;DR

Abstract

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (14)