Solving the flexible job-shop scheduling problem through an enhanced deep reinforcement learning approach
Imanol Echeverria, Maialen Murua, Roberto Santana
TL;DR
This work tackles the flexible job-shop scheduling problem (FJSSP) under real-time disruption by formulating FJSSP as a Markov Decision Process (MDP) and solving it with a heterogeneous graph neural network (HGNN)–driven policy trained via Proximal Policy Optimization (PPO). It introduces two key enhancements: dispatching-rule (DR) based action masking to prune the action space and a Diverse Scheduling Policies (DSSP) pipeline that uses Bayesian optimization (BO) and KNN to generate and select diverse policies for parallel inference. Empirical results on two public benchmarks show that the proposed Enhanced Diverse Scheduling Policies (EDSP) approach outperforms traditional dispatching rules and three state-of-the-art DRL methods, with particularly large gains on big instances, and even competitive performance versus OR-Tools on large problems. The work advances real-time FJSSP solving by combining a compact, informative MDP representation with efficient policy generation and robust diversification, enabling scalable and practical decision-making in dynamic manufacturing settings.
Abstract
In scheduling problems common in the industry and various real-world scenarios, responding in real-time to disruptive events is essential. Recent methods propose the use of deep reinforcement learning (DRL) to learn policies capable of generating solutions under this constraint. The objective of this paper is to introduce a new DRL method for solving the flexible job-shop scheduling problem, particularly for large instances. The approach is based on the use of heterogeneous graph neural networks to a more informative graph representation of the problem. This novel modeling of the problem enhances the policy's ability to capture state information and improve its decision-making capacity. Additionally, we introduce two novel approaches to enhance the performance of the DRL approach: the first involves generating a diverse set of scheduling policies, while the second combines DRL with dispatching rules (DRs) constraining the action space. Experimental results on two public benchmarks show that our approach outperforms DRs and achieves superior results compared to three state-of-the-art DRL methods, particularly for large instances.
