LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning
Hyungho Na, Il-chul Moon
TL;DR
The paper tackles slow policy learning in cooperative MARL due to the absence of explicit goals and sparse or dense rewards. It introduces LAGMA, a latent goal-guided framework that combines a Modified VQ-VAE with a coverage loss to obtain a discretized latent space, an extended VQ codebook to store and evaluate goal-reaching trajectories, and a latent goal-guided intrinsic reward to bias centralized training toward reference paths under CTDE. Empirical results on StarCraft II SMAC and Google Research Football show superior performance over strong baselines, with ablations confirming the importance of the coverage loss and trajectory-based value estimation. The approach improves sample efficiency and policy convergence in complex multi-agent tasks, offering a scalable route to goal-directed coordination.
Abstract
In cooperative multi-agent reinforcement learning (MARL), agents collaborate to achieve common goals, such as defeating enemies and scoring a goal. However, learning goal-reaching paths toward such a semantic goal takes a considerable amount of time in complex tasks and the trained model often fails to find such paths. To address this, we present LAtent Goal-guided Multi-Agent reinforcement learning (LAGMA), which generates a goal-reaching trajectory in latent space and provides a latent goal-guided incentive to transitions toward this reference trajectory. LAGMA consists of three major components: (a) quantized latent space constructed via a modified VQ-VAE for efficient sample utilization, (b) goal-reaching trajectory generation via extended VQ codebook, and (c) latent goal-guided intrinsic reward generation to encourage transitions towards the sampled goal-reaching path. The proposed method is evaluated by StarCraft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines.
