Table of Contents
Fetching ...

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

Xuan Zhou, Xiang Shi, Lele Zhang, Chen Chen, Hongbo Li, Lin Ma, Fang Deng, Jie Chen

TL;DR

This work tackles hyper-scale multi-robot task planning in warehouse RMFS by framing MRTP as an MDP with options on an asynchronous temporal graph with cycle constraints (C2AMRTG). It introduces a centralized hierarchical framework with HTAN to reduce action-space dimensionality, and a hierarchical RL algorithm (HCR-REINFORCE) complemented by a multi-stage curriculum (HCR2C) to address scaling and generalization. The HTAN-robot and HTAN-node networks enable scalable, variable-length inputs and accurate option probabilities, while the counterfactual rollout baseline improves credit assignment across policy layers. Experimental results across fixed/random simulated scales and real-world data show superior planning quality and significantly faster planning times, with successful scaling up to 200 robots and 1000 racks on unlearned maps. The approach demonstrates strong practical impact for large-scale RMFS and offers a path toward more autonomous, scalable warehouse automation systems.

Abstract

To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

TL;DR

This work tackles hyper-scale multi-robot task planning in warehouse RMFS by framing MRTP as an MDP with options on an asynchronous temporal graph with cycle constraints (C2AMRTG). It introduces a centralized hierarchical framework with HTAN to reduce action-space dimensionality, and a hierarchical RL algorithm (HCR-REINFORCE) complemented by a multi-stage curriculum (HCR2C) to address scaling and generalization. The HTAN-robot and HTAN-node networks enable scalable, variable-length inputs and accurate option probabilities, while the counterfactual rollout baseline improves credit assignment across policy layers. Experimental results across fixed/random simulated scales and real-world data show superior planning quality and significantly faster planning times, with successful scaling up to 200 robots and 1000 racks on unlearned maps. The approach demonstrates strong practical impact for large-scale RMFS and offers a path toward more autonomous, scalable warehouse automation systems.

Abstract

To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.
Paper Structure (40 sections, 25 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 40 sections, 25 equations, 13 figures, 5 tables, 2 algorithms.

Figures (13)

  • Figure 1: The operational process for MRTP in RMFS.
  • Figure 2: The hierarchical temporal multi-robot task planning framework and temporal logic for a specific multi-robot task planning instance in RMFS with $2$ mobile robots and $4$ retrieval racks.
  • Figure 3: The hierarchy temporal attention network architecture including the robot net (left) and the graph node net (right).
  • Figure 4: The diagram of map layout for M1-M9 maps.
  • Figure 5: Comparison results of training curves for various methods on different fixed-scale simulated instances.
  • ...and 8 more figures