Efficient Replay Memory Architectures in Multi-Agent Reinforcement Learning for Traffic Congestion Control
Mukul Chodhary, Kevin Octavian, SooJean Han
TL;DR
This work addresses congestion control in large-scale traffic networks using multi-agent reinforcement learning with memory-efficient exploration. It introduces Dual-Memory Integrated Learning (DMIL), a two-tier memory system with short-term and long-term memories, plus equivalence-class embeddings based on group-equivariance to bound memory growth while preserving learning performance. Theoretical analyses establish that the dual-memory size $msize_{Dual}[t]$ is bounded above by the SARSA replay size, and experiments on grid networks show that DMIL, especially with complex equivalence embeddings and entropy/diffusion rewards, improves congestion metrics and reduces memory growth relative to standard SARSA. The approach is scalable, modular, and demonstrates the value of heterogeneous memory and symmetry-based abstractions for efficient MARL in traffic control scenarios.
Abstract
Episodic control, inspired by the role of episodic memory in the human brain, has been shown to improve the sample inefficiency of model-free reinforcement learning by reusing high-return past experiences. However, the memory growth of episodic control is undesirable in large-scale multi-agent problems such as vehicle traffic management. This paper proposes a novel replay memory architecture called Dual-Memory Integrated Learning, to augment to multi-agent reinforcement learning methods for congestion control via adaptive light signal scheduling. Our dual-memory architecture mimics two core capabilities of human decision-making. First, it relies on diverse types of memory--semantic and episodic, short-term and long-term--in order to remember high-return states that occur often in the network and filter out states that don't. Second, it employs equivalence classes to group together similar state-action pairs and that can be controlled using the same action (i.e., light signal sequence). Theoretical analyses establish memory growth bounds, and simulation experiments on several intersection networks showcase improved congestion performance (e.g., vehicle throughput) from our method.
