Fluid-Agent Reinforcement Learning

Shishir Sharma; Doina Precup; Theodore J. Perkins

Fluid-Agent Reinforcement Learning

Shishir Sharma, Doina Precup, Theodore J. Perkins

TL;DR

This paper proposes a framework that allows agents to create other agents and yields agent teams that adjust their size dynamically to match environmental demands, and presents game-theoretic solution concepts for fluid-agent games and empirically evaluates the performance of several MARL algorithms within this framework.

Abstract

The primary focus of multi-agent reinforcement learning (MARL) has been to study interactions among a fixed number of agents embedded in an environment. However, in the real world, the number of agents is neither fixed nor known a priori. Moreover, an agent can decide to create other agents (for example, a cell may divide, or a company may spin off a division). In this paper, we propose a framework that allows agents to create other agents; we call this a fluid-agent environment. We present game-theoretic solution concepts for fluid-agent games and empirically evaluate the performance of several MARL algorithms within this framework. Our experiments include fluid variants of established benchmarks such as Predator-Prey and Level-Based Foraging, where agents can dynamically spawn, as well as a new environment we introduce that highlights how fluidity can unlock novel solution strategies beyond those observed in fixed-population settings. We demonstrate that this framework yields agent teams that adjust their size dynamically to match environmental demands.

Fluid-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (57 sections, 3 theorems, 9 equations, 5 figures, 5 tables)

This paper contains 57 sections, 3 theorems, 9 equations, 5 figures, 5 tables.

Introduction
Related Work
Background
Fluid-Agent RL
Partially Observable Fluid Stochastic Game
Nash Equilibrium
Subgame--Perfect Nash Equilibrium
Fluid-agent Environments
Fluid Predator-Prey
Dynamics
Observation Space
Action Space
Reward Function
Fluid Level-Based Foraging
Dynamics
...and 42 more sections

Key Result

Theorem 1

Every POFSG possesses a stationary mixed–strategy Nash equilibrium.

Figures (5)

Figure 1: Fluid-agent Environments
Figure 2: Algorithmic comparison on Predator-Prey environment under different payoff normalizations.
Figure 3: (a) Fluid and fixed groups on tasks with preys sampled from a distribution. Error bars represent one standard deviation. (b) Density plot showing the relationship between episodic prey availability and converged fluid-agent populations.
Figure 4: Algorithmic comparison on the LBF environment.
Figure 5: Episode return (top) and alive agents (bottom) during training on PuddleBridge. Points are colored by gate condition: open (blue) or closed (orange).

Theorems & Definitions (5)

Theorem 1: Existence of Stationary Nash Equilibrium
proof
Theorem 2: Existence of Subgame-Perfect Nash Equilibrium
proof
Lemma 1: Fixed–Population Embedding

Fluid-Agent Reinforcement Learning

TL;DR

Abstract

Fluid-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)