Table of Contents
Fetching ...

Hierarchical Reinforcement Learning for Optimal Agent Grouping in Cooperative Systems

Liyuan Hu

TL;DR

The paper tackles the combinatorial challenge of sequential agent grouping in cooperative multi-agent systems by introducing a hierarchical RL framework under CTDE, where a high-level option policy selects agent groupings and a low-level intra-option policy governs daily actions. It leverages permutation-invariant Deep Set architectures to compress the joint Q-function and policy, enabling scalable learning through a decomposition of the joint Q-function into pairwise components within the option-critic framework. Key contributions include parameter-efficient critics and policies for large option spaces, a dimension-reduction network that preserves permutation invariance, and demonstrated gains in a simulated, Intern Health Study-like environment. The approach offers scalable, dynamic grouping for cooperative MAS with practical impact on resource matching, team formation, and coordinated interventions in real-world domains.

Abstract

This paper presents a hierarchical reinforcement learning (RL) approach to address the agent grouping or pairing problem in cooperative multi-agent systems. The goal is to simultaneously learn the optimal grouping and agent policy. By employing a hierarchical RL framework, we distinguish between high-level decisions of grouping and low-level agents' actions. Our approach utilizes the CTDE (Centralized Training with Decentralized Execution) paradigm, ensuring efficient learning and scalable execution. We incorporate permutation-invariant neural networks to handle the homogeneity and cooperation among agents, enabling effective coordination. The option-critic algorithm is adapted to manage the hierarchical decision-making process, allowing for dynamic and optimal policy adjustments.

Hierarchical Reinforcement Learning for Optimal Agent Grouping in Cooperative Systems

TL;DR

The paper tackles the combinatorial challenge of sequential agent grouping in cooperative multi-agent systems by introducing a hierarchical RL framework under CTDE, where a high-level option policy selects agent groupings and a low-level intra-option policy governs daily actions. It leverages permutation-invariant Deep Set architectures to compress the joint Q-function and policy, enabling scalable learning through a decomposition of the joint Q-function into pairwise components within the option-critic framework. Key contributions include parameter-efficient critics and policies for large option spaces, a dimension-reduction network that preserves permutation invariance, and demonstrated gains in a simulated, Intern Health Study-like environment. The approach offers scalable, dynamic grouping for cooperative MAS with practical impact on resource matching, team formation, and coordinated interventions in real-world domains.

Abstract

This paper presents a hierarchical reinforcement learning (RL) approach to address the agent grouping or pairing problem in cooperative multi-agent systems. The goal is to simultaneously learn the optimal grouping and agent policy. By employing a hierarchical RL framework, we distinguish between high-level decisions of grouping and low-level agents' actions. Our approach utilizes the CTDE (Centralized Training with Decentralized Execution) paradigm, ensuring efficient learning and scalable execution. We incorporate permutation-invariant neural networks to handle the homogeneity and cooperation among agents, enabling effective coordination. The option-critic algorithm is adapted to manage the hierarchical decision-making process, allowing for dynamic and optimal policy adjustments.
Paper Structure (10 sections, 7 equations, 2 figures, 1 table)

This paper contains 10 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Permutation-Invariant Policy Network for subject $j$ from time $i$ at time $t$. Assume team $i$ is matched with team $i^\prime$.
  • Figure 2: Permutation-Invariant Critic Network Design.