HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation
Zirui Wang, Xinran Zhao, Simon Stepputtis, Woojun Kim, Tongshuang Wu, Katia Sycara, Yaqi Xie
TL;DR
This paper tackles multi-agent action anticipation, where prior models inadequately leverage inter-agent interactions and long-range context. It introduces HiMemFormer, a transformer-based architecture with a dual-hierarchical memory mechanism: a global memory module that aggregates joint history and an Agent-to-Context Encoder, plus a Context-to-Agent Decoder that performs coarse-to-fine refinement to produce agent-specific forecasts; the memory flow involves $M_L^{(a)}$, $M_L^{(c)}$, $\widehat{M}_L$, $M_S^{(c)}$, $M_S^{(a)}$, and learnable tokens $Q_F$, $Q'_F$. Empirical results on the LEMMA dataset show consistent gains over baselines such as LSTR and MAT across four scenarios, with additional HiMemFormer+ achieving further improvements. The work highlights the importance of modeling both long-term joint context and agent-specific short-term cues, advancing capabilities for safe and coordinated multi-agent systems.
Abstract
Understanding and predicting human actions has been a long-standing challenge and is a crucial measure of perception in robotics AI. While significant progress has been made in anticipating the future actions of individual agents, prior work has largely overlooked a key aspect of real-world human activity -- interactions. To address this gap in human-like forecasting within multi-agent environments, we present the Hierarchical Memory-Aware Transformer (HiMemFormer), a transformer-based model for online multi-agent action anticipation. HiMemFormer integrates and distributes global memory that captures joint historical information across all agents through a transformer framework, with a hierarchical local memory decoder that interprets agent-specific features based on these global representations using a coarse-to-fine strategy. In contrast to previous approaches, HiMemFormer uniquely hierarchically applies the global context with agent-specific preferences to avoid noisy or redundant information in multi-agent action anticipation. Extensive experiments on various multi-agent scenarios demonstrate the significant performance of HiMemFormer, compared with other state-of-the-art methods.
