Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
Wenchang Duan, Yaoliang Yu, Jiwan He, Yi Shi
TL;DR
Addresses fixed large context lengths in MARL and proposes ACL-LFT that uses a central agent to adaptively choose context length and a Fourier-based low-frequency truncation to reduce redundancy. The central agent selects truncation length from a discrete set and uses multi-head attention to shape the reward signal, while the decentralized agents operate on filtered temporal information. The method is supported by theoretical results showing a long-term advantage of adaptive context length and by extensive experiments across PettingZoo, MiniGrid, GRF, and SMACv2, where it achieves state-of-the-art performance. This work advances scalable, robust MARL by enabling effective long-range dependencies without incurring prohibitive computation.
Abstract
Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).
