LLMs can Schedule

Henrik Abgaryan; Ararat Harutyunyan; Tristan Cazenave

LLMs can Schedule

Henrik Abgaryan, Ararat Harutyunyan, Tristan Cazenave

TL;DR

This work investigates end-to-end scheduling for the Job Shop Scheduling Problem (JSSP) using Large Language Models (LLMs). It introduces the first supervised NL dataset (~120k NL problem–solution pairs) and demonstrates that finetuning a Phi-3 LLM with LoRA and a sampling strategy can achieve competitive makespans ($C_{\max}$) compared to neural baselines like L2D and SLJ on small to moderate instances. The authors deploy a NL-to-schedule pipeline, using OR-Tools to generate feasible solutions for labels and a parsing/validation system to assess outputs, with a grid-search-based hyperparameter tuning that favors sampling. While promising, the approach faces computational overhead and generalization challenges for larger JSSP instances, motivating future work on scaling, interpretability, and hybrid architectures with RL or GNNs.

Abstract

The job shop scheduling problem (JSSP) remains a significant hurdle in optimizing production processes. This challenge involves efficiently allocating jobs to a limited number of machines while minimizing factors like total processing time or job delays. While recent advancements in artificial intelligence have yielded promising solutions, such as reinforcement learning and graph neural networks, this paper explores the potential of Large Language Models (LLMs) for JSSP. We introduce the very first supervised 120k dataset specifically designed to train LLMs for JSSP. Surprisingly, our findings demonstrate that LLM-based scheduling can achieve performance comparable to other neural approaches. Furthermore, we propose a sampling method that enhances the effectiveness of LLMs in tackling JSSP.

LLMs can Schedule

TL;DR

) compared to neural baselines like L2D and SLJ on small to moderate instances. The authors deploy a NL-to-schedule pipeline, using OR-Tools to generate feasible solutions for labels and a parsing/validation system to assess outputs, with a grid-search-based hyperparameter tuning that favors sampling. While promising, the approach faces computational overhead and generalization challenges for larger JSSP instances, motivating future work on scaling, interpretability, and hybrid architectures with RL or GNNs.

Abstract

Paper Structure (16 sections, 2 figures, 1 table)

This paper contains 16 sections, 2 figures, 1 table.

Introduction
Related Work
Preliminary
Dataset Generation
Converting JSSP problem instance to Natural Language: Feature Generation
Approach 1: Job-Centric
Approach 2: Machine-Centric
Zero-shot inference and Label generation
Training
Evaluation
Overview of JSSP Solution Parsing and Validation
Validating JSSP Solution
Hyperparameter Tuning
Comparative Analysis with Other Neural Approaches
Conclusion
...and 1 more sections

Figures (2)

Figure 1: Left: Training and Validation Losses of Phi-3 Model. Right: The Norm of the Gradient during the fine-tuning of Phi-3 Model
Figure 2: Box plot comparison of different models: (1) L2D original with greedy inference using a 20x20 policy network (L2D Gap original), (2) L2D policy network with sampling (s=10) (L2D Gap s=10), (3) SLJ models trained with $\beta$ values of 32, 128, and 256, and evaluated with sampling (s=10), and (4) Fine-tuned Phi-3 model inference with sampling (s=10). Left side y-axis indicates the gap in percentages and the average time in seconds is indicated by the right y-axis.

LLMs can Schedule

TL;DR

Abstract

LLMs can Schedule

Authors

TL;DR

Abstract

Table of Contents

Figures (2)