Table of Contents
Fetching ...

Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop Scheduling

Cong Zhang, Zhiguang Cao, Wen Song, Yaoxin Wu, Jie Zhang

TL;DR

This work introduces a DRL-guided improvement heuristic for Job Shop Scheduling that encodes complete solutions as disjunctive graphs and employs a two-module GNN (TPM and CAM) to capture topological and contextual information during search. A novel n-step REINFORCE training regime and a batch-oriented message-passing evaluator enable efficient, scalable evaluation of many neighbor solutions, with the policy yielding linear time complexity in the problem size. Empirical results on seven benchmarks show the method consistently outperforms state-of-the-art DRL baselines and hand-crafted rules, and even surpasses CP-SAT on very large instances within practical time budgets. The approach significantly narrows the gap to optimality, demonstrates strong generalization to longer improvement horizons, and offers a practical, scalable DRL framework for scheduling in manufacturing settings.

Abstract

Recent studies in using deep reinforcement learning (DRL) to solve Job-shop scheduling problems (JSSP) focus on construction heuristics. However, their performance is still far from optimality, mainly because the underlying graph representation scheme is unsuitable for modelling partial solutions at each construction step. This paper proposes a novel DRL-guided improvement heuristic for solving JSSP, where graph representation is employed to encode complete solutions. We design a Graph Neural-Network-based representation scheme, consisting of two modules to effectively capture the information of dynamic topology and different types of nodes in graphs encountered during the improvement process. To speed up solution evaluation during improvement, we present a novel message-passing mechanism that can evaluate multiple solutions simultaneously. We prove that the computational complexity of our method scales linearly with problem size. Experiments on classic benchmarks show that the improvement policy learned by our method outperforms state-of-the-art DRL-based methods by a large margin.

Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop Scheduling

TL;DR

This work introduces a DRL-guided improvement heuristic for Job Shop Scheduling that encodes complete solutions as disjunctive graphs and employs a two-module GNN (TPM and CAM) to capture topological and contextual information during search. A novel n-step REINFORCE training regime and a batch-oriented message-passing evaluator enable efficient, scalable evaluation of many neighbor solutions, with the policy yielding linear time complexity in the problem size. Empirical results on seven benchmarks show the method consistently outperforms state-of-the-art DRL baselines and hand-crafted rules, and even surpasses CP-SAT on very large instances within practical time budgets. The approach significantly narrows the gap to optimality, demonstrates strong generalization to longer improvement horizons, and offers a practical, scalable DRL framework for scheduling in manufacturing settings.

Abstract

Recent studies in using deep reinforcement learning (DRL) to solve Job-shop scheduling problems (JSSP) focus on construction heuristics. However, their performance is still far from optimality, mainly because the underlying graph representation scheme is unsuitable for modelling partial solutions at each construction step. This paper proposes a novel DRL-guided improvement heuristic for solving JSSP, where graph representation is employed to encode complete solutions. We design a Graph Neural-Network-based representation scheme, consisting of two modules to effectively capture the information of dynamic topology and different types of nodes in graphs encountered during the improvement process. To speed up solution evaluation during improvement, we present a novel message-passing mechanism that can evaluate multiple solutions simultaneously. We prove that the computational complexity of our method scales linearly with problem size. Experiments on classic benchmarks show that the improvement policy learned by our method outperforms state-of-the-art DRL-based methods by a large margin.
Paper Structure (36 sections, 5 theorems, 9 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 5 theorems, 9 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

The proposed policy network has linear time complexity w.r.t both $|\mathcal{J}|$ and $|\mathcal{M}|$.

Figures (8)

  • Figure 1: Disjunctive graph representation. Left: a $3\times 3$ JSSP instance. Black arrows are conjunctions. Dotted lines of different colors are disjunctions, grouping operations on each machine into machine cliques. Right: a complete solution, where a critical path and a critical block are highlighted. Arc weights are omitted for clarity.
  • Figure 2: Our local search framework and an example of state transition. In Figure (b), the state $s_t$ is transited to $s_{t+1}$ by swapping operation $O_{22}$ and $O_{31}$. A new critical path with its two critical blocks is generated and highlighted in $s_{t+1}$.
  • Figure 3: The architecture of our policy network.
  • Figure 4: Example of $G_J$ and $G_M$.
  • Figure 5: The computational complexity of RL-GNN, ScheduleNet, and our method (500 improvement steps). In the left figure, we fix $|\mathcal{J}| = 40$ and test on various number of machines $|\mathcal{M}|$. While in the right figure, we fix $|\mathcal{M}| = 10$ and test on various number of jobs $|\mathcal{J}|$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Theorem 4.1
  • Theorem 4.2
  • Corollary 4.3
  • Lemma C.1
  • Lemma C.2