TIGER-MARL: Enhancing Multi-Agent Reinforcement Learning with Temporal Information through Graph-based Embeddings and Representations
Nikunj Gupta, Ludwika Twardecka, James Zachary Hare, Jesse Milzman, Rajgopal Kannan, Viktor Prasanna
TL;DR
TIGER-MARL addresses the challenge of capturing evolving inter-agent coordination in multi-agent reinforcement learning by constructing dynamic temporal graphs and applying a temporal attention mechanism to produce time-aware agent embeddings. By explicitly linking current interactions with historical ones through static, self-history, and neighbor-history connections, TIGER enables adaptive coordination under partial observability. Empirical results on Gather and Tag show that TIGER improves task performance and sample efficiency over strong value-decomposition and graph-based baselines, with ablations clarifying the impact of temporal depth and structural connectivity. The work demonstrates the practicality of temporal graph reasoning for scalable, robust MARL and provides open-source code for reproducibility.
Abstract
In this paper, we propose capturing and utilizing \textit{Temporal Information through Graph-based Embeddings and Representations} or \textbf{TIGER} to enhance multi-agent reinforcement learning (MARL). We explicitly model how inter-agent coordination structures evolve over time. While most MARL approaches rely on static or per-step relational graphs, they overlook the temporal evolution of interactions that naturally arise as agents adapt, move, or reorganize cooperation strategies. Capturing such evolving dependencies is key to achieving robust and adaptive coordination. To this end, TIGER constructs dynamic temporal graphs of MARL agents, connecting their current and historical interactions. It then employs a temporal attention-based encoder to aggregate information across these structural and temporal neighborhoods, yielding time-aware agent embeddings that guide cooperative policy learning. Through extensive experiments on two coordination-intensive benchmarks, we show that TIGER consistently outperforms diverse value-decomposition and graph-based MARL baselines in task performance and sample efficiency. Furthermore, we conduct comprehensive ablation studies to isolate the impact of key design parameters in TIGER, revealing how structural and temporal factors can jointly shape effective policy learning in MARL. All codes can be found here: https://github.com/Nikunj-Gupta/tiger-marl.
