Table of Contents
Fetching ...

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

TL;DR

A leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability is applied.

Abstract

Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries.

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

TL;DR

A leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability is applied.

Abstract

Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries.
Paper Structure (13 sections, 18 equations, 5 figures, 5 tables, 3 algorithms)

This paper contains 13 sections, 18 equations, 5 figures, 5 tables, 3 algorithms.

Figures (5)

  • Figure 1: Illustration of (a) state and (b) action of the follower for operation $o$.
  • Figure 2: Overview of our MARL scheduling algorithm based on (a) leader and (b) followers. Here, $\pi^o$ is the incumbent policy of the follower.
  • Figure 3: Training of the three models on the low-demand case of the long-term production scenario. Min-max normalization is applied to the total reward to standardize the scale of the y-axis.
  • Figure 4: Completion rate histogram of three comparison models for the long-term production scenario. The x-axis tick values are omitted for confidentiality.
  • Figure 5: Completion rate histogram of three comparison models for the short-term production scenario. The x-axis tick values are omitted for confidentiality.