Table of Contents
Fetching ...

Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning

Byunghyun Yoo, Younghwan Shin, Hyunwoo Kim, Euisok Chung, Jeongmin Yang

TL;DR

This work addresses how fixed episode lengths hinder learning in multi-agent reinforcement learning by proposing Adaptive Episode Length Adjustment (AELA), which begins with short episodes and incrementally increases the horizon based on learning progress. The approach is grounded in a theoretical link between shorter horizons, secure exploration, and reduced dead-end states, using entropy of value estimates as a progression signal. Empirically, AELA enhances convergence speed and final performance on SMAC and Modified Predator-Prey, improving both VDN- and QMIX-based MARL methods. The findings offer a practical, model-agnostic strategy to boost MARL performance in complex, multi-agent environments where dead-ends and long-horizon planning pose challenges.

Abstract

In standard reinforcement learning, an episode is defined as a sequence of interactions between agents and the environment, which terminates upon reaching a terminal state or a pre-defined episode length. Setting a shorter episode length enables the generation of multiple episodes with the same number of data samples, thereby facilitating an exploration of diverse states. While shorter episodes may limit the collection of long-term interactions, they may offer significant advantages when properly managed. For example, trajectory truncation in single-agent reinforcement learning has shown how the benefits of shorter episodes can be leveraged despite the trade-off of reduced long-term interaction experiences. However, this approach remains underexplored in MARL. This paper proposes a novel MARL approach, Adaptive Episode Length Adjustment (AELA), where the episode length is initially limited and gradually increased based on an entropy-based assessment of learning progress. By starting with shorter episodes, agents can focus on learning effective strategies for initial states and minimize time spent in dead-end states. The use of entropy as an assessment metric prevents premature convergence to suboptimal policies and ensures balanced training over varying episode lengths. We validate our approach using the StarCraft Multi-agent Challenge (SMAC) and a modified predator-prey environment, demonstrating significant improvements in both convergence speed and overall performance compared to existing methods. To the best of our knowledge, this is the first study to adaptively adjust episode length in MARL based on learning progress.

Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning

TL;DR

This work addresses how fixed episode lengths hinder learning in multi-agent reinforcement learning by proposing Adaptive Episode Length Adjustment (AELA), which begins with short episodes and incrementally increases the horizon based on learning progress. The approach is grounded in a theoretical link between shorter horizons, secure exploration, and reduced dead-end states, using entropy of value estimates as a progression signal. Empirically, AELA enhances convergence speed and final performance on SMAC and Modified Predator-Prey, improving both VDN- and QMIX-based MARL methods. The findings offer a practical, model-agnostic strategy to boost MARL performance in complex, multi-agent environments where dead-ends and long-horizon planning pose challenges.

Abstract

In standard reinforcement learning, an episode is defined as a sequence of interactions between agents and the environment, which terminates upon reaching a terminal state or a pre-defined episode length. Setting a shorter episode length enables the generation of multiple episodes with the same number of data samples, thereby facilitating an exploration of diverse states. While shorter episodes may limit the collection of long-term interactions, they may offer significant advantages when properly managed. For example, trajectory truncation in single-agent reinforcement learning has shown how the benefits of shorter episodes can be leveraged despite the trade-off of reduced long-term interaction experiences. However, this approach remains underexplored in MARL. This paper proposes a novel MARL approach, Adaptive Episode Length Adjustment (AELA), where the episode length is initially limited and gradually increased based on an entropy-based assessment of learning progress. By starting with shorter episodes, agents can focus on learning effective strategies for initial states and minimize time spent in dead-end states. The use of entropy as an assessment metric prevents premature convergence to suboptimal policies and ensures balanced training over varying episode lengths. We validate our approach using the StarCraft Multi-agent Challenge (SMAC) and a modified predator-prey environment, demonstrating significant improvements in both convergence speed and overall performance compared to existing methods. To the best of our knowledge, this is the first study to adaptively adjust episode length in MARL based on learning progress.

Paper Structure

This paper contains 17 sections, 6 theorems, 11 equations, 7 figures.

Key Result

Lemma 1

Let $E_L$ denote the episode length, and let $l$ (where $l = 1, 2, \dots, E_L$) denote the interaction step (i.e., the time step within each episode). Let $P_s(l)$ be the probability that the state is secure at interaction step $l$. Then, it holds that:

Figures (7)

  • Figure 1: Median test return in the MPP tasks
  • Figure 2: Median test win rates with different SMAC scenarios
  • Figure 3: Limited episode length during training
  • Figure 4: Number of samples with interaction steps
  • Figure 5: Snapshot of the final policy for AELA-QMIX in 6h_vs_8z with interaction steps $l$
  • ...and 2 more figures

Theorems & Definitions (10)

  • Definition 1: Dead-end State
  • Definition 2: Secure state
  • Lemma 1
  • Theorem 1
  • Definition 3: The probability of visiting dead-end states
  • Definition 4: Regret
  • Corollary 1
  • Theorem 2
  • Theorem 1
  • Theorem 2