Evolution of Societies via Reinforcement Learning

Yann Bouteiller; Karthik Soma; Giovanni Beltrame

Evolution of Societies via Reinforcement Learning

Yann Bouteiller, Karthik Soma, Giovanni Beltrame

TL;DR

This work addresses the scalability challenge of studying social evolution when agents learn via MARL by deriving fast, exact PG and LOLA updates for symmetric normal-form games and batching updates across large populations. The authors demonstrate 200{,}000-agent evolutionary simulations in Stag Hunt, Hawk-Dove, and Rock-Paper-Scissors, revealing that LOLA can promote cooperation in SH, delay cooperative outcomes in HD, and reduce diversity in RPS, while the mean population policy tends toward Nash due to uniform random matching. The methodology provides a practical framework to analyze how advanced MARL revision protocols shape societal dynamics at scale and offers insights into when non-stationarity-aware learning may favor cooperation or diversity. This work lays groundwork for future extensions to episodic MARL and structured partner interactions in large populations, with open-source implementations to facilitate reproducibility and further research.

Abstract

The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are typically constrained to small, homogeneous populations and remain computationally intensive. We propose a methodology that enables simulating populations of Reinforcement Learning agents at evolutionary scale. More specifically, we derive a fast, parallelizable implementation of Policy Gradient (PG) and Opponent-Learning Awareness (LOLA), tailored for evolutionary simulations where agents undergo random pairwise interactions in stateless normal-form games. We demonstrate our approach by simulating the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. In our experiments, 200,000 PG or LOLA agents evolve in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game provides distinct insights into how populations evolve under both naive and advanced MARL rules, including compelling ways in which Opponent-Learning Awareness affects social evolution.

Evolution of Societies via Reinforcement Learning

TL;DR

Abstract

Evolution of Societies via Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)

Theorems & Definitions (1)