Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

Constantin Waubert de Puiseau; Christian Dörpelkus; Jannik Peters; Hasan Tercan; Tobias Meisen

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

Constantin Waubert de Puiseau, Christian Dörpelkus, Jannik Peters, Hasan Tercan, Tobias Meisen

TL;DR

The paper addresses how to effectively deploy DRL-based construction heuristics for the Job Shop Scheduling Problem (JSSP) under fixed compute budgets by introducing δ-sampling, which exponentiates action logits with $δ$ to bias toward exploration ($δ<1$) or exploitation ($δ>1$), with $δ=1$ as the stochastic baseline. It provides a grid-search-like algorithm to identify the optimal $δ^*$ for a given trained agent and sample size, and demonstrates that the optimal balance is highly task- and budget-dependent. Empirical results across 6x6, 15x15, 20x20, 100x20 JSSP sizes and Taillard benchmarks show δ-sampling can improve makespans $C^{*}$ over deterministic and baseline stochastic sampling, sometimes matching the performance of much larger samples at a fraction of the cost. The work suggests broad applicability to other learned construction heuristics and highlights potential for combining with MCTS to further enhance inference-time search under computational budgets.

Abstract

Learned construction heuristics for scheduling problems have become increasingly competitive with established solvers and heuristics in recent years. In particular, significant improvements have been observed in solution approaches using deep reinforcement learning (DRL). While much attention has been paid to the design of network architectures and training algorithms to achieve state-of-the-art results, little research has investigated the optimal use of trained DRL agents during inference. Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget. We propose a simple yet effective parameterization, called $δ$-sampling that manipulates the trained action vector to bias agent behavior towards exploration or exploitation during solution construction. By following this approach, we can achieve a more comprehensive coverage of the search space while still generating an acceptable number of solutions. In addition, we propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent. Experiments extending existing training protocols for job shop scheduling problems with our inference method validate our hypothesis and result in the expected improvements of the generated solutions.

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

TL;DR

to bias toward exploration (

) or exploitation (

), with

as the stochastic baseline. It provides a grid-search-like algorithm to identify the optimal

for a given trained agent and sample size, and demonstrates that the optimal balance is highly task- and budget-dependent. Empirical results across 6x6, 15x15, 20x20, 100x20 JSSP sizes and Taillard benchmarks show δ-sampling can improve makespans

over deterministic and baseline stochastic sampling, sometimes matching the performance of much larger samples at a fraction of the cost. The work suggests broad applicability to other learned construction heuristics and highlights potential for combining with MCTS to further enhance inference-time search under computational budgets.

Abstract

-sampling that manipulates the trained action vector to bias agent behavior towards exploration or exploitation during solution construction. By following this approach, we can achieve a more comprehensive coverage of the search space while still generating an acceptable number of solutions. In addition, we propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent. Experiments extending existing training protocols for job shop scheduling problems with our inference method validate our hypothesis and result in the expected improvements of the generated solutions.

Paper Structure (10 sections, 2 equations, 5 figures, 3 tables)

This paper contains 10 sections, 2 equations, 5 figures, 3 tables.

Introduction
Related Work
Method and Experiments
$\delta$-Sampling
Experimental Setup
Results
Hypothesis Validation
Optimal $\delta$-Values
Performance Improvements
Conclusion and Outlook

Figures (5)

Figure 1: Expected minimal makespans $C^{*}$ over sampling size for different sampling strategies
Figure 2: Three iterations of searching the optimal $\delta$ value
Figure 3: Minimal makespans over sampling size for different sampling strategies
Figure 4: Examples of results in the $\delta$ value search algorithm with found minima
Figure 5: Comparison of sampling methods over sample sizes

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

TL;DR

Abstract

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)