Table of Contents
Fetching ...

Multi-agent Reinforcement Learning for Dynamic Dispatching in Material Handling Systems

Xian Yeow Lee, Haiyan Wang, Daisuke Katsumata, Takaharu Matsui, Chetan Gupta

TL;DR

This work tackles dynamic dispatching in material handling systems by framing it as a CTDE MARL optimization problem, enabling multiple asynchronous decision points to be coordinated via learned policies. An event-based Python simulator (PettingZoo) mirrors a three-loop conveyor with 4 incoming, 20 storage, and 6 outgoing points, and a 500-pallet pool, providing a realistic testbed for MARL. The authors introduce a heuristic-guided exploration strategy by interleaving domain heuristics during training of a Monte-Carlo MAPPO framework with a centralized critic, and they further explore decoupling heuristics from evaluation to assess true policy capability. Results show that MARL policies with heuristics outperform hand-crafted baselines (up to 7.4% median throughput in some setups), and architectural choices (separate critics) as well as iterative training (using prior MARL iterations as heuristics) can yield additional gains, demonstrating practical potential for deploying MARL-driven dynamic dispatching in real-world material handling systems.

Abstract

This paper proposes a multi-agent reinforcement learning (MARL) approach to learn dynamic dispatching strategies, which is crucial for optimizing throughput in material handling systems across diverse industries. To benchmark our method, we developed a material handling environment that reflects the complexities of an actual system, such as various activities at different locations, physical constraints, and inherent uncertainties. To enhance exploration during learning, we propose a method to integrate domain knowledge in the form of existing dynamic dispatching heuristics. Our experimental results show that our method can outperform heuristics by up to 7.4 percent in terms of median throughput. Additionally, we analyze the effect of different architectures on MARL performance when training multiple agents with different functions. We also demonstrate that the MARL agents performance can be further improved by using the first iteration of MARL agents as heuristics to train a second iteration of MARL agents. This work demonstrates the potential of applying MARL to learn effective dynamic dispatching strategies that may be deployed in real-world systems to improve business outcomes.

Multi-agent Reinforcement Learning for Dynamic Dispatching in Material Handling Systems

TL;DR

This work tackles dynamic dispatching in material handling systems by framing it as a CTDE MARL optimization problem, enabling multiple asynchronous decision points to be coordinated via learned policies. An event-based Python simulator (PettingZoo) mirrors a three-loop conveyor with 4 incoming, 20 storage, and 6 outgoing points, and a 500-pallet pool, providing a realistic testbed for MARL. The authors introduce a heuristic-guided exploration strategy by interleaving domain heuristics during training of a Monte-Carlo MAPPO framework with a centralized critic, and they further explore decoupling heuristics from evaluation to assess true policy capability. Results show that MARL policies with heuristics outperform hand-crafted baselines (up to 7.4% median throughput in some setups), and architectural choices (separate critics) as well as iterative training (using prior MARL iterations as heuristics) can yield additional gains, demonstrating practical potential for deploying MARL-driven dynamic dispatching in real-world material handling systems.

Abstract

This paper proposes a multi-agent reinforcement learning (MARL) approach to learn dynamic dispatching strategies, which is crucial for optimizing throughput in material handling systems across diverse industries. To benchmark our method, we developed a material handling environment that reflects the complexities of an actual system, such as various activities at different locations, physical constraints, and inherent uncertainties. To enhance exploration during learning, we propose a method to integrate domain knowledge in the form of existing dynamic dispatching heuristics. Our experimental results show that our method can outperform heuristics by up to 7.4 percent in terms of median throughput. Additionally, we analyze the effect of different architectures on MARL performance when training multiple agents with different functions. We also demonstrate that the MARL agents performance can be further improved by using the first iteration of MARL agents as heuristics to train a second iteration of MARL agents. This work demonstrates the potential of applying MARL to learn effective dynamic dispatching strategies that may be deployed in real-world systems to improve business outcomes.
Paper Structure (19 sections, 7 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 19 sections, 7 equations, 6 figures, 3 tables, 4 algorithms.

Figures (6)

  • Figure 1: Example layout of a simplified material handling system.
  • Figure 2: Distributions of throughput during evaluation of different strategies. 'L', 'M', 'H' denotes manually designed heuristics, 'MARL' denotes a vanilla training procedure, 'MARL + X' denote MARL policies trained with heuristics and legends with '(NA)' denote evaluation without heuristics.
  • Figure 3: Distributions of throughput during the evaluation of a heuristic strategy, compared with MARL with a joint critic v.s. separate critics when training policies with different action spaces.
  • Figure 4: Distributions of throughput during evaluation of a heuristic strategy, MARL policies trained heuristics and MARL policies trained with previous iterations of MARL policies.
  • Figure 5: Additional details on training setup
  • ...and 1 more figures