Table of Contents
Fetching ...

HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning

Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang

TL;DR

HARP (HumanAssisted Regrouping with Permutation Invariant Group Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks, is proposed, enabling and allowing non-experts to offer effective guidance with minimal intervention.

Abstract

Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at https://github.com/huawen-hu/HARP.

HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning

TL;DR

HARP (HumanAssisted Regrouping with Permutation Invariant Group Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks, is proposed, enabling and allowing non-experts to offer effective guidance with minimal intervention.

Abstract

Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at https://github.com/huawen-hu/HARP.
Paper Structure (13 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: HARP automatically forms groups during training to achieve collaborative task completion. In the deployment phase, it actively seeks assistance from humans, evaluates their suggestions, and provides feedback on the groups received.
  • Figure 2: The overall framework of HARP. The Agent Network uses gate recurrent unit (GRU) to capture long-term dependencies in past sequences and encodes hidden layer states to obtain state representations. The Automatic Grouping section utilizes Select and Kick along with hypernetworks to achieve dynamic grouping. The rightmost part shows the Mixer network and human participation component, including the Permutation Invariant Group Critic.
  • Figure 3: Permutation Invariant Group Critic
  • Figure 4: Grouping visualization during training and deployment phases. (a) and (b) show the visualization and interpretability analysis of automatic grouping results during the training process, while (c) and (d) present the visualization of human-assisted results during the deployment phase. The 'm' in these maps refers to Marine. 'MMM' represents a battle configuration of 1 Medivac, 2 Marauders, and 7 Marines on each side, while 'MMM2' with 1 Medivac, 2 Marauders, and 7 Marines against 1 Medivac, 3 Marauders, and 8 Marines. In the 'corridor' map, players control 6 Zealots facing 24 Zerglings.
  • Figure 5: The GUI interface shown to humans during the deployment phase, including the relative positions of each agent and their health percentages.
  • ...and 1 more figures