Table of Contents
Fetching ...

GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint

Zhaoqilin Yang, Chanchan Li, Tianqi Liu, Hongxin Zhao, Youliang Tian

TL;DR

This work tackles sustaining cooperation in spatial public goods games (SPGG) by introducing GRPO-GCC, which integrates Group Relative Policy Optimization with a Global Cooperation Constraint. The method uses group-normalized advantages and a KL penalty within GRPO, plus a global incentive term that scales cooperative payoffs according to the population-wide cooperation rate $g$, promoting cooperation at intermediate levels while discouraging extremes. Key contributions include the first application of GRPO to SPGG, a GCC mechanism that dynamically reshapes incentives, and empirical evidence of accelerated, stable, and robust cooperation across diverse initializations on large lattices. The approach offers a principled, scalable framework for resilient multi-agent coordination in socio-technical systems with structured interactions.

Abstract

Inspired by the principle of self-regulating cooperation in collective institutions, we propose the Group Relative Policy Optimization with Global Cooperation Constraint (GRPO-GCC) framework. This work is the first to introduce GRPO into spatial public goods games, establishing a new deep reinforcement learning baseline for structured populations. GRPO-GCC integrates group relative policy optimization with a global cooperation constraint that strengthens incentives at intermediate cooperation levels while weakening them at extremes. This mechanism aligns local decision making with sustainable collective outcomes and prevents collapse into either universal defection or unconditional cooperation. The framework advances beyond existing approaches by combining group-normalized advantage estimation, a reference-anchored KL penalty, and a global incentive term that dynamically adjusts cooperative payoffs. As a result, it achieves accelerated cooperation onset, stabilized policy adaptation, and long-term sustainability. GRPO-GCC demonstrates how a simple yet global signal can reshape incentives toward resilient cooperation, and provides a new paradigm for multi-agent reinforcement learning in socio-technical systems.

GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint

TL;DR

This work tackles sustaining cooperation in spatial public goods games (SPGG) by introducing GRPO-GCC, which integrates Group Relative Policy Optimization with a Global Cooperation Constraint. The method uses group-normalized advantages and a KL penalty within GRPO, plus a global incentive term that scales cooperative payoffs according to the population-wide cooperation rate , promoting cooperation at intermediate levels while discouraging extremes. Key contributions include the first application of GRPO to SPGG, a GCC mechanism that dynamically reshapes incentives, and empirical evidence of accelerated, stable, and robust cooperation across diverse initializations on large lattices. The approach offers a principled, scalable framework for resilient multi-agent coordination in socio-technical systems with structured interactions.

Abstract

Inspired by the principle of self-regulating cooperation in collective institutions, we propose the Group Relative Policy Optimization with Global Cooperation Constraint (GRPO-GCC) framework. This work is the first to introduce GRPO into spatial public goods games, establishing a new deep reinforcement learning baseline for structured populations. GRPO-GCC integrates group relative policy optimization with a global cooperation constraint that strengthens incentives at intermediate cooperation levels while weakening them at extremes. This mechanism aligns local decision making with sustainable collective outcomes and prevents collapse into either universal defection or unconditional cooperation. The framework advances beyond existing approaches by combining group-normalized advantage estimation, a reference-anchored KL penalty, and a global incentive term that dynamically adjusts cooperative payoffs. As a result, it achieves accelerated cooperation onset, stabilized policy adaptation, and long-term sustainability. GRPO-GCC demonstrates how a simple yet global signal can reshape incentives toward resilient cooperation, and provides a new paradigm for multi-agent reinforcement learning in socio-technical systems.

Paper Structure

This paper contains 18 sections, 10 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: The policy network outputs a stochastic probability distribution over cooperation and defection within the GRPO-GCC framework.
  • Figure 2: Cooperation rates under different $\beta$ values. $\beta$ controls KL penalty strength, with $\beta=0.04$ yielding the highest cooperation rate at $r=5.0$.
  • Figure 3: Cooperation rates under different $\eta$ values. $\eta$ controls the number of sampled candidates, with $\eta=8$ achieving the best performance at $r=5.0$.
  • Figure 4: Cooperation rates under different $\zeta$ values. $\zeta$ controls the number of inner updates, with $\zeta=3$ achieving the best performance at $r=5.0$.
  • Figure 5: Cooperation rate with varying global cooperation coefficient $\rho$ in GRPO-GCC. Higher $\rho$ values promote cooperation even at smaller $r$.
  • ...and 7 more figures