Table of Contents
Fetching ...

Offline reinforcement learning for job-shop scheduling problems

Imanol Echeverria, Maialen Murua, Roberto Santana

TL;DR

This work tackles real-time solution of job-shop scheduling problems by proposing an offline reinforcement learning method (H-ORL) on heterogeneous graphs with a variable action space. The approach encodes actions as edge attributes within a heterogenous graph MDP and balances reward maximization with imitation through a KL divergence term, built on an offline TD3 backbone. It introduces a dedicated data generation strategy to create diverse offline experiences and a specialized loss to handle suboptimal transitions, enabling robust learning for JSSP and FJSSP benchmarks. Empirical results on Taillard JSSP and five FJSSP datasets show that H-ORL consistently achieves lower optimal gaps than state-of-the-art DRL and BC methods, with strong generalization to larger instances and real-time applicability for complex scheduling tasks.

Abstract

Recent advances in deep learning have shown significant potential for solving combinatorial optimization problems in real-time. Unlike traditional methods, deep learning can generate high-quality solutions efficiently, which is crucial for applications like routing and scheduling. However, existing approaches like deep reinforcement learning (RL) and behavioral cloning have notable limitations, with deep RL suffering from slow learning and behavioral cloning relying solely on expert actions, which can lead to generalization issues and neglect of the optimization objective. This paper introduces a novel offline RL method designed for combinatorial optimization problems with complex constraints, where the state is represented as a heterogeneous graph and the action space is variable. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks, achieving superior performance compared to state-of-the-art techniques.

Offline reinforcement learning for job-shop scheduling problems

TL;DR

This work tackles real-time solution of job-shop scheduling problems by proposing an offline reinforcement learning method (H-ORL) on heterogeneous graphs with a variable action space. The approach encodes actions as edge attributes within a heterogenous graph MDP and balances reward maximization with imitation through a KL divergence term, built on an offline TD3 backbone. It introduces a dedicated data generation strategy to create diverse offline experiences and a specialized loss to handle suboptimal transitions, enabling robust learning for JSSP and FJSSP benchmarks. Empirical results on Taillard JSSP and five FJSSP datasets show that H-ORL consistently achieves lower optimal gaps than state-of-the-art DRL and BC methods, with strong generalization to larger instances and real-time applicability for complex scheduling tasks.

Abstract

Recent advances in deep learning have shown significant potential for solving combinatorial optimization problems in real-time. Unlike traditional methods, deep learning can generate high-quality solutions efficiently, which is crucial for applications like routing and scheduling. However, existing approaches like deep reinforcement learning (RL) and behavioral cloning have notable limitations, with deep RL suffering from slow learning and behavioral cloning relying solely on expert actions, which can lead to generalization issues and neglect of the optimization objective. This paper introduces a novel offline RL method designed for combinatorial optimization problems with complex constraints, where the state is represented as a heterogeneous graph and the action space is variable. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks, achieving superior performance compared to state-of-the-art techniques.

Paper Structure

This paper contains 23 sections, 9 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: A solution to the JSSP instance with a makespan of 15.
  • Figure 2: An optimal solution with a makespan of 11.
  • Figure 3: The model architecture of our approach.
  • Figure 4: Comparison of BC, which focuses on optimal paths, with offline RL, which also explores suboptimal states and low-reward actions.
  • Figure 5: Comparison using different numbers of training instances and lambda parameters.
  • ...and 1 more figures