Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics
Hisato Komatsu
TL;DR
The paper tackles efficient simulation of coordinated pedestrian dynamics using multi-agent reinforcement learning (MARL) by combining echo-state networks (ESN) with least squares policy iteration (LSPI). By sharing input and reservoir parameters within groups and updating output weights per episode, the approach achieves data-efficient learning in grid-world pedestrian tasks, including forked-route routing and bidirectional lane formation, with rewards $r_{i,t}=+1$ for progress in the target direction and $r_{i,t}=-1$ for opposing progress. Key findings show that the ESN+LSPI framework learns to move while avoiding others at moderate densities, forms lanes in Task II up to $n_{agent}=48$, but experiences jamming at higher densities, and generally offers lower computational cost than representative deep RL methods. The work demonstrates the viability of reservoir computing for scalable MARL in collective behavior and points to future improvements in processing capacity, exploration, and scaling to large-scale pedestrian or evacuation scenarios, with code available for replication.
Abstract
In recent years, simulations of pedestrians using the multi-agent reinforcement learning (MARL) have been studied. This study considered the roads on a grid-world environment, and implemented pedestrians as MARL agents using an echo-state network and the least squares policy iteration method. Under this environment, the ability of these agents to learn to move forward by avoiding other agents was investigated. Specifically, we considered two types of tasks: the choice between a narrow direct route and a broad detour, and the bidirectional pedestrian flow in a corridor. The simulations results indicated that the learning was successful when the density of the agents was not that high.
