Table of Contents
Fetching ...

Extending NGU to Multi-Agent RL: A Preliminary Study

Juan Hernandez, Diego Fernández, Manuel Cifuentes, Denis Parra, Rodrigo Toro Icarte

TL;DR

Sparse rewards present a significant challenge in multi-agent settings. The paper extends NGU to MARL by retaining core intrinsic exploration components and evaluating them on the simple_tag task in PettingZoo, comparing against a Multi-DQN baseline. It analyzes three design choices—shared versus individual replay buffers, cross-agent novelty sharing, and heterogeneous beta—and finds that shared replay buffers provide the strongest gains, while novelty sharing can help at k=1 and heterogeneous beta offers limited benefits. Overall, the study demonstrates that NGU-based intrinsic exploration can be effectively deployed in MARL when experiences are shared and exploration signals are carefully tuned.

Abstract

The Never Give Up (NGU) algorithm has proven effective in reinforcement learning tasks with sparse rewards by combining episodic novelty and intrinsic motivation. In this work, we extend NGU to multi-agent environments and evaluate its performance in the simple_tag environment from the PettingZoo suite. Compared to a multi-agent DQN baseline, NGU achieves moderately higher returns and more stable learning dynamics. We investigate three design choices: (1) shared replay buffer versus individual replay buffers, (2) sharing episodic novelty among agents using different k thresholds, and (3) using heterogeneous values of the beta parameter. Our results show that NGU with a shared replay buffer yields the best performance and stability, highlighting that the gains come from combining NGU intrinsic exploration with experience sharing. Novelty sharing performs comparably when k = 1 but degrades learning for larger values. Finally, heterogeneous beta values do not improve over a small common value. These findings suggest that NGU can be effectively applied in multi-agent settings when experiences are shared and intrinsic exploration signals are carefully tuned.

Extending NGU to Multi-Agent RL: A Preliminary Study

TL;DR

Sparse rewards present a significant challenge in multi-agent settings. The paper extends NGU to MARL by retaining core intrinsic exploration components and evaluating them on the simple_tag task in PettingZoo, comparing against a Multi-DQN baseline. It analyzes three design choices—shared versus individual replay buffers, cross-agent novelty sharing, and heterogeneous beta—and finds that shared replay buffers provide the strongest gains, while novelty sharing can help at k=1 and heterogeneous beta offers limited benefits. Overall, the study demonstrates that NGU-based intrinsic exploration can be effectively deployed in MARL when experiences are shared and exploration signals are carefully tuned.

Abstract

The Never Give Up (NGU) algorithm has proven effective in reinforcement learning tasks with sparse rewards by combining episodic novelty and intrinsic motivation. In this work, we extend NGU to multi-agent environments and evaluate its performance in the simple_tag environment from the PettingZoo suite. Compared to a multi-agent DQN baseline, NGU achieves moderately higher returns and more stable learning dynamics. We investigate three design choices: (1) shared replay buffer versus individual replay buffers, (2) sharing episodic novelty among agents using different k thresholds, and (3) using heterogeneous values of the beta parameter. Our results show that NGU with a shared replay buffer yields the best performance and stability, highlighting that the gains come from combining NGU intrinsic exploration with experience sharing. Novelty sharing performs comparably when k = 1 but degrades learning for larger values. Finally, heterogeneous beta values do not improve over a small common value. These findings suggest that NGU can be effectively applied in multi-agent settings when experiences are shared and intrinsic exploration signals are carefully tuned.

Paper Structure

This paper contains 13 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The simple_tag environment from the PettingZoo suite. Multiple pursuers (red) cooperate to capture an evader (blue) in a bounded 2D arena.
  • Figure 2: Learning curves of pursuers in the simple_tag environment. Results are averaged over 15 runs with smoothed returns (window=100), and the shaded regions indicate the 95% confidence interval. The left panel corresponds to training without a shared replay buffer, while the right panel shows results with buffer sharing.
  • Figure 3: Results for $k=1$ and $k=2$ are averaged over 15 runs, while $k=3$ is averaged over 5 runs. All curves are smoothed with a window of 100, and the shaded regions indicate the 95% confidence interval.
  • Figure 4: Learning curves of heterogeneous $\beta$ variants compared with Multi-NGU in the simple_tag environment. Results are averaged over 15 runs with smoothed returns (window=100), except for $(0.2, 0.4, 0.6)$ which is averaged over 10 runs. The shaded regions indicate the 95% confidence interval.