Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Jaeyeon Jang; Diego Klabjan; Han Liu; Nital S. Patel; Xiuqi Li; Balakrishnan Ananthanarayanan; Husam Dauod; Tzung-Han Juang

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

TL;DR

A leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability is applied.

Abstract

Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries.

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

TL;DR

Abstract

Paper Structure (13 sections, 18 equations, 5 figures, 5 tables, 3 algorithms)

This paper contains 13 sections, 18 equations, 5 figures, 5 tables, 3 algorithms.

Introduction
Related works
Problem statement
Scheduling model for factory-wide DFJSSP
Follower model
Leader model
Environment and Training
Rule-based conversion algorithm
Experimental Evaluations
Implementation details
Comparison with benchmarks
Ablation study
Conclusion

Figures (5)

Figure 1: Illustration of (a) state and (b) action of the follower for operation $o$.
Figure 2: Overview of our MARL scheduling algorithm based on (a) leader and (b) followers. Here, $\pi^o$ is the incumbent policy of the follower.
Figure 3: Training of the three models on the low-demand case of the long-term production scenario. Min-max normalization is applied to the total reward to standardize the scale of the y-axis.
Figure 4: Completion rate histogram of three comparison models for the long-term production scenario. The x-axis tick values are omitted for confidentiality.
Figure 5: Completion rate histogram of three comparison models for the short-term production scenario. The x-axis tick values are omitted for confidentiality.

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

TL;DR

Abstract

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)