Table of Contents
Fetching ...

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

Yangkun Chen, Kai Yang, Jian Tao, Jiafei Lyu

TL;DR

This paper tackles sample inefficiency and limited strategy diversity in multi-agent reinforcement learning by introducing MANGER, a novelty-guided data reuse framework. MANGER uses a Random Network Distillation (RND) based novelty score to adapt per-agent update frequencies, enabling targeted data reuse and promoting diverse agent behaviors through a separated critic architecture. The approach integrates with QMIX, incorporating per-agent update scheduling and selective gradient updates to shared and independent critic components, and demonstrates superior performance on challenging cooperative tasks such as SMAC and Google Research Football, including SMAC-V2. The findings indicate that focusing updates on novel observations can yield faster convergence and more specialized, teamwork-oriented policies with minimal overhead.

Abstract

Recently, deep Multi-Agent Reinforcement Learning (MARL) has demonstrated its potential to tackle complex cooperative tasks, pushing the boundaries of AI in collaborative environments. However, the efficiency of these systems is often compromised by inadequate sample utilization and a lack of diversity in learning strategies. To enhance MARL performance, we introduce a novel sample reuse approach that dynamically adjusts policy updates based on observation novelty. Specifically, we employ a Random Network Distillation (RND) network to gauge the novelty of each agent's current state, assigning additional sample update opportunities based on the uniqueness of the data. We name our method Multi-Agent Novelty-GuidEd sample Reuse (MANGER). This method increases sample efficiency and promotes exploration and diverse agent behaviors. Our evaluations confirm substantial improvements in MARL effectiveness in complex cooperative scenarios such as Google Research Football and super-hard StarCraft II micromanagement tasks.

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

TL;DR

This paper tackles sample inefficiency and limited strategy diversity in multi-agent reinforcement learning by introducing MANGER, a novelty-guided data reuse framework. MANGER uses a Random Network Distillation (RND) based novelty score to adapt per-agent update frequencies, enabling targeted data reuse and promoting diverse agent behaviors through a separated critic architecture. The approach integrates with QMIX, incorporating per-agent update scheduling and selective gradient updates to shared and independent critic components, and demonstrates superior performance on challenging cooperative tasks such as SMAC and Google Research Football, including SMAC-V2. The findings indicate that focusing updates on novel observations can yield faster convergence and more specialized, teamwork-oriented policies with minimal overhead.

Abstract

Recently, deep Multi-Agent Reinforcement Learning (MARL) has demonstrated its potential to tackle complex cooperative tasks, pushing the boundaries of AI in collaborative environments. However, the efficiency of these systems is often compromised by inadequate sample utilization and a lack of diversity in learning strategies. To enhance MARL performance, we introduce a novel sample reuse approach that dynamically adjusts policy updates based on observation novelty. Specifically, we employ a Random Network Distillation (RND) network to gauge the novelty of each agent's current state, assigning additional sample update opportunities based on the uniqueness of the data. We name our method Multi-Agent Novelty-GuidEd sample Reuse (MANGER). This method increases sample efficiency and promotes exploration and diverse agent behaviors. Our evaluations confirm substantial improvements in MARL effectiveness in complex cooperative scenarios such as Google Research Football and super-hard StarCraft II micromanagement tasks.

Paper Structure

This paper contains 15 sections, 10 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualization of the environment. In the SMAC image, the green box represents the tank role that actively absorbs damage and sacrifices itself to create an output environment for teammates. The red box represents the damage dealer role that activates attacks against enemies. The blue box represents the roaming role, similar to a guerrilla fighter, that can attract enemy aggro based on its own movement and lead some enemies away from the battlefield, preventing the enemies from focusing solely on the tank and causing it to be killed instantly.
  • Figure 2: Overview of the MANGER framework. We employ the RND network to assess the novelty of each agent's observations, thereby enabling differentiated updates among agents. Furthermore, we ensure that each additional update does not interfere with the agents by decomposing the network.
  • Figure 3: Experimental results on SMAC. All curves are averaged over 5 independent runs.
  • Figure 4: Experiments on GRF environments. All curves are averaged over 5 independent runs.
  • Figure 5: Graphical illustration of agent diversity. (a) shows how the agent within the red box should operate, with Q-values for actions. We calculated the Q-values of different agents and displayed three representative ones. For the same observation, Agent 1 moves southward to protect teammates; Agent 3 attacks low-health enemies; Agent 6 moves to the upper left to pull enemies away. We calculated the cosine similarity between the Q-values of other agents and the current agent's Q-values. (b) and (c) compare our method with Qmix, showing cosine similarity with other agents under the same observation. Our agents exhibit more diversity.
  • ...and 2 more figures