LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

Hyungho Na; Il-chul Moon

LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

Hyungho Na, Il-chul Moon

TL;DR

The paper tackles slow policy learning in cooperative MARL due to the absence of explicit goals and sparse or dense rewards. It introduces LAGMA, a latent goal-guided framework that combines a Modified VQ-VAE with a coverage loss to obtain a discretized latent space, an extended VQ codebook to store and evaluate goal-reaching trajectories, and a latent goal-guided intrinsic reward to bias centralized training toward reference paths under CTDE. Empirical results on StarCraft II SMAC and Google Research Football show superior performance over strong baselines, with ablations confirming the importance of the coverage loss and trajectory-based value estimation. The approach improves sample efficiency and policy convergence in complex multi-agent tasks, offering a scalable route to goal-directed coordination.

Abstract

In cooperative multi-agent reinforcement learning (MARL), agents collaborate to achieve common goals, such as defeating enemies and scoring a goal. However, learning goal-reaching paths toward such a semantic goal takes a considerable amount of time in complex tasks and the trained model often fails to find such paths. To address this, we present LAtent Goal-guided Multi-Agent reinforcement learning (LAGMA), which generates a goal-reaching trajectory in latent space and provides a latent goal-guided incentive to transitions toward this reference trajectory. LAGMA consists of three major components: (a) quantized latent space constructed via a modified VQ-VAE for efficient sample utilization, (b) goal-reaching trajectory generation via extended VQ codebook, and (c) latent goal-guided intrinsic reward generation to encourage transitions towards the sampled goal-reaching path. The proposed method is evaluated by StarCraft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines.

LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 10 equations, 14 figures, 6 tables, 5 algorithms)

This paper contains 26 sections, 1 theorem, 10 equations, 14 figures, 6 tables, 5 algorithms.

Introduction
Related Works
Preliminaries
Methodology
State Embedding via Modified VQ-VAE
Goal-Reaching Trajectory Generation with Extended VQ Codebook
Intrinsic Reward Generation
Overall Learning Objective
Experiments
Performance evaluation on SMAC
Performance evaluation on GRF
Ablation study
Qualitative analysis
Conclusions
Mathematical Proof
...and 11 more sections

Key Result

Proposition 4.1

Provided that $\tau_{\chi_t}^{*}$ is a goal-reaching trajectory and $s' \in \tau_{\chi_t}^{*}$, an intrinsic reward $r^I(s'):=\gamma ({C_{q,t}}(s') - {{\max }_{\boldsymbol{a}'}}{Q_{{\theta ^ - }}}(s',\boldsymbol{a}'))$ to the current TD-target $y=r(s,\boldsymbol{a}) + \gamma V_{{\theta ^ - }}(s')$ g

Figures (14)

Figure 1: Overview of LAGMA framework.(a) VQ-VAE constructs quantized vector space with coverage loss, while (b) VQ codebook stores goal-reaching sequences from a given $x_{q,t}$. Then, (c) the goal-reaching trajectory is compared with the current batch trajectory to generate (d) intrinsic reward. MARL training is done by (e) the standard CTDE framework.
Figure 2: Visualization of embedding results via VQ-VAE. Under SMAC 5m_vs_6m task, the size of codebook $n_c=64$, the latent dimension $D=8$; this illustrates embeddings at training time at T=1.0M. Colored dots represent $\chi$, which is a state presentation before quantization, and gray dots are quantized vector representations belonging to VQ codebook derived from the state representations. Colors from red to purple (rainbow) represent from small to large timestep within episodes.
Figure 3: Histogram of recalled quantized vector.
Figure 4: Intrinsic reward generation by comparing the current trajectory in quantized latent space ($\tau_{\chi_t}$) with a sampled goal-reaching trajectory ($\tau_{\chi_t}^*$).
Figure 5: Performance comparison of LAGMA against baseline algorithms on two easy and hard SMAC maps: 1c3s5z, 5m_vs_6m, and two super hard SMAC maps: MMM2, 6h_vs_8z. (Dense reward setting)
...and 9 more figures

Theorems & Definitions (4)

Definition 3.1
Proposition 4.1
proof
proof

LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

TL;DR

Abstract

LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (4)