Hierarchical Reinforcement Learning for Optimal Agent Grouping in Cooperative Systems
Liyuan Hu
TL;DR
The paper tackles the combinatorial challenge of sequential agent grouping in cooperative multi-agent systems by introducing a hierarchical RL framework under CTDE, where a high-level option policy selects agent groupings and a low-level intra-option policy governs daily actions. It leverages permutation-invariant Deep Set architectures to compress the joint Q-function and policy, enabling scalable learning through a decomposition of the joint Q-function into pairwise components within the option-critic framework. Key contributions include parameter-efficient critics and policies for large option spaces, a dimension-reduction network that preserves permutation invariance, and demonstrated gains in a simulated, Intern Health Study-like environment. The approach offers scalable, dynamic grouping for cooperative MAS with practical impact on resource matching, team formation, and coordinated interventions in real-world domains.
Abstract
This paper presents a hierarchical reinforcement learning (RL) approach to address the agent grouping or pairing problem in cooperative multi-agent systems. The goal is to simultaneously learn the optimal grouping and agent policy. By employing a hierarchical RL framework, we distinguish between high-level decisions of grouping and low-level agents' actions. Our approach utilizes the CTDE (Centralized Training with Decentralized Execution) paradigm, ensuring efficient learning and scalable execution. We incorporate permutation-invariant neural networks to handle the homogeneity and cooperation among agents, enabling effective coordination. The option-critic algorithm is adapted to manage the hierarchical decision-making process, allowing for dynamic and optimal policy adjustments.
