Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization
Seongmin Kim, Giseung Park, Woojun Kim, Jiwon Jeon, Seungyeol Han, Youngchul Sung
TL;DR
A novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation through Generalized Per-Agent Advantage Estimator, which employs a per-agent value iteration operator to compute precise per-agent advantages.
Abstract
In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct Q-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent's own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in coordination and sample efficiency for complex scenarios.
