FoX: Formation-aware exploration in multi-agent reinforcement learning

Yonghyeon Jo; Sunwoo Lee; Junghyuk Yeom; Seungyul Han

FoX: Formation-aware exploration in multi-agent reinforcement learning

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yeom, Seungyul Han

TL;DR

FoX addresses exploration challenges in cooperative MARL under partial observability by introducing a formation-based equivalence relation that compresses the exploration space and by employing formation-aware intrinsic rewards. The framework combines an entropy-driven exploration objective with a mutual-information term to encourage agents to infer and respect the current formation using only local observations, implemented via a variational autoencoder–style encoder–decoder setup and a gradient-flipping mechanism to suppress irrelevant information. FoX also decomposes per-agent Q-functions with a shared, local, and formation-aware component within a CTDE-compatible architecture, using formation-based index sets to control the formation representation. Empirical results on sparse SMAC and Google Research Football demonstrate that FoX yields significant performance improvements over baselines, validating both the space-reduction strategy and the formation-awareness approach as effective tools for scalable MARL exploration.

Abstract

Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.

FoX: Formation-aware exploration in multi-agent reinforcement learning

TL;DR

Abstract

Paper Structure (22 sections, 2 theorems, 10 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 10 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Deep Multi-Agent Reinforcement Learning
Exploration in State Space
Intrinsic Motivations in MARL
Preliminaries
Decentralized POMDP
Centralized Training Decentralized Execution
FoX: Formation-aware Exploration
Motivation of Formation-aware Exploration
Formation Arrangement
Formation-aware Exploration Objective
Selection of Index Set $F^i$
Formation-based Shared Network
Experiments
...and 7 more sections

Key Result

Lemma 1

The binary relation $\sim_\mathcal{F}$ is an equivalence relation on the exploration space $\mathcal{S}^e$, i.e., two exploration states $s_1$ and $s_2$ in the exploration state $\mathcal{S}^e$ are equivalent under $\sim_\mathcal{F}$ if $\mathcal{F}(s_1) = \mathcal{F}(s_2)$.

Figures (8)

Figure 1: (a) Illustration of formation-based state equivalence. Defining state equivalence under formations can reduce the search space efficiently. (b) Illustration of formation-awareness. Our method encourages agents to be fully aware of the formation.
Figure 2: (a) Pure exploration reward based on visitation count graph. (b) Average reward graph on set of rarely visited formation $S_1$ and frequently visited formation $S_2$. (c) The reward difference graph for each component. (d) Heatmap of diverse formation-based exploration space. With initial spawn formation at (0,0), a farther point in the heatmap indicates a larger difference in formations.
Figure 3: $\mathcal{F}$-Net Architecture
Figure 4: Formations with various agent index set $F$. While $F^{i, max}$, $F^{i, min}$ focuses on particular agent relationships, $F^{i,all}$ and $F^{i,max,min}$ considers extensive agent relationships.
Figure 5: Schema of FoX framework
...and 3 more figures

Theorems & Definitions (2)

Lemma 1: Formation-based equivalence relation
Lemma 2: Evidence lower bound

FoX: Formation-aware exploration in multi-agent reinforcement learning

TL;DR

Abstract

FoX: Formation-aware exploration in multi-agent reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)