Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

Hisato Komatsu

Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

Hisato Komatsu

TL;DR

The paper tackles efficient simulation of coordinated pedestrian dynamics using multi-agent reinforcement learning (MARL) by combining echo-state networks (ESN) with least squares policy iteration (LSPI). By sharing input and reservoir parameters within groups and updating output weights per episode, the approach achieves data-efficient learning in grid-world pedestrian tasks, including forked-route routing and bidirectional lane formation, with rewards $r_{i,t}=+1$ for progress in the target direction and $r_{i,t}=-1$ for opposing progress. Key findings show that the ESN+LSPI framework learns to move while avoiding others at moderate densities, forms lanes in Task II up to $n_{agent}=48$, but experiences jamming at higher densities, and generally offers lower computational cost than representative deep RL methods. The work demonstrates the viability of reservoir computing for scalable MARL in collective behavior and points to future improvements in processing capacity, exploration, and scaling to large-scale pedestrian or evacuation scenarios, with code available for replication.

Abstract

In recent years, simulations of pedestrians using the multi-agent reinforcement learning (MARL) have been studied. This study considered the roads on a grid-world environment, and implemented pedestrians as MARL agents using an echo-state network and the least squares policy iteration method. Under this environment, the ability of these agents to learn to move forward by avoiding other agents was investigated. Specifically, we considered two types of tasks: the choice between a narrow direct route and a broad detour, and the bidirectional pedestrian flow in a corridor. The simulations results indicated that the learning was successful when the density of the agents was not that high.

Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

TL;DR

for progress in the target direction and

for opposing progress. Key findings show that the ESN+LSPI framework learns to move while avoiding others at moderate densities, forms lanes in Task II up to

, but experiences jamming at higher densities, and generally offers lower computational cost than representative deep RL methods. The work demonstrates the viability of reservoir computing for scalable MARL in collective behavior and points to future improvements in processing capacity, exploration, and scaling to large-scale pedestrian or evacuation scenarios, with code available for replication.

Abstract

Paper Structure (17 sections, 31 equations, 20 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 31 equations, 20 figures, 3 tables, 1 algorithm.

Introduction
Related work
Multi-agent reinforcement learning (MARL)
Echo-state network (ESN)
Least squares policy iteration (LSPI) method
Proposed method and settings of simulations
Application of ESN to the LSPI method
environment and observation of each agent
Task. I : Choice between a narrow direct route and a broad detour
Task. II : Bidirectional pedestrian flow in a corridor
Results
Performance in task. I
Performance in task. II
Comparison with independent learners
Comparison with the case that two groups share the parameters in task. II
...and 2 more sections

Figures (20)

Figure 1: Eyesight of each agent. We called $3 \times 3$, $7 \times 7$, and $11 \times 11$ cells centering on agent itself (painted red in this figure) as $\mathcal{A}_{1}$, $\mathcal{A}_{2}$, and $\mathcal{A}_{3}$, respectively.
Figure 2: Forked road considered in task. I. Grids indicating vacant areas and walls are painted black and green, respectively. White lines are drawn to emphasize the borders of grids.
Figure 3: Initial placement of agents at (a).$n_{\mathrm{agent}} = 12$, (b).$n_{\mathrm{agent}} = 24$, (c).$n_{\mathrm{agent}} = 32$ and (d)$n_{\mathrm{agent}} = 40$ in the task. I. Meanings of the black and green cells are the same as Fig. \ref{['detour_grid']}, and agents are painted red.
Figure 4: Initial placement of agents at (a).$n_{\mathrm{agent}} = 16$, (b).$n_{\mathrm{agent}} = 32$, (c).$n_{\mathrm{agent}} = 48$ and (d)$n_{\mathrm{agent}} = 64$ in the task. II. Meanings of the black and green cells are the same as Fig. \ref{['detour_grid']}, and right(left)-proceeding agents are painted red(blue). In this figure, agents of different groups are painted in different colors so that we can distinguish them from each other. However, they have the same color, $(1,0)$, in the viewpoint of agents.
Figure 5: Learning curves of task. I at (a).$n_{\mathrm{agent}} = 12$, (b).$n_{\mathrm{agent}} = 24$, (c).$n_{\mathrm{agent}} = 32$ and (d)$n_{\mathrm{agent}} = 40$. The green curve is the mean value of all agents' rewards, and the red and blue curves indicate the rewards of the best and worst-performing agents. Each value is averaged over 8 independent trials, and the standard errors taken from them are painted in pale colors.
...and 15 more figures

Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

TL;DR

Abstract

Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

Authors

TL;DR

Abstract

Table of Contents

Figures (20)