Table of Contents
Fetching ...

Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling

Sirui Li, Wenbin Ouyang, Yining Ma, Cathy Wu

TL;DR

This work tackles long-horizon COPs, specifically FJSP, by introducing Learning-Guided Rolling Horizon Optimization (L-RHO), a framework that learns to fix overlapping decisions across consecutive RHO iterations to shrink subproblems. A Look-Ahead Oracle provides training labels for a neural network $f_\theta$ that predicts which overlapping operations should be fixed, enabling a restricted subproblem $\hat{P}_r$ that accelerates solving CP-SAT-based FJSP subproblems. Empirically, L-RHO achieves up to 54% reduction in solve time and up to 21% improvement in solution quality on canonical offline FJSP, and demonstrates robustness across online variants, congestion, noise, and breakdowns, supported by a probabilistic analysis of FP/FN trade-offs. The approach offers a scalable, data-efficient path to accelerate long-horizon COPs with potential applicability beyond FJSP, and provides theoretical criteria to gauge when learning-based RHO will be beneficial.

Abstract

Long-horizon combinatorial optimization problems (COPs), such as the Flexible Job-Shop Scheduling Problem (FJSP), often involve complex, interdependent decisions over extended time frames, posing significant challenges for existing solvers. While Rolling Horizon Optimization (RHO) addresses this by decomposing problems into overlapping shorter-horizon subproblems, such overlap often involves redundant computations. In this paper, we present L-RHO, the first learning-guided RHO framework for COPs. L-RHO employs a neural network to intelligently fix variables that in hindsight did not need to be re-optimized, resulting in smaller and thus easier-to-solve subproblems. For FJSP, this means identifying operations with unchanged machine assignments between consecutive subproblems. Applied to FJSP, L-RHO accelerates RHO by up to 54% while significantly improving solution quality, outperforming other heuristic and learning-based baselines. We also provide in-depth discussions and verify the desirable adaptability and generalization of L-RHO across numerous FJSP variates, distributions, online scenarios and benchmark instances. Moreover, we provide a theoretical analysis to elucidate the conditions under which learning is beneficial.

Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling

TL;DR

This work tackles long-horizon COPs, specifically FJSP, by introducing Learning-Guided Rolling Horizon Optimization (L-RHO), a framework that learns to fix overlapping decisions across consecutive RHO iterations to shrink subproblems. A Look-Ahead Oracle provides training labels for a neural network that predicts which overlapping operations should be fixed, enabling a restricted subproblem that accelerates solving CP-SAT-based FJSP subproblems. Empirically, L-RHO achieves up to 54% reduction in solve time and up to 21% improvement in solution quality on canonical offline FJSP, and demonstrates robustness across online variants, congestion, noise, and breakdowns, supported by a probabilistic analysis of FP/FN trade-offs. The approach offers a scalable, data-efficient path to accelerate long-horizon COPs with potential applicability beyond FJSP, and provides theoretical criteria to gauge when learning-based RHO will be beneficial.

Abstract

Long-horizon combinatorial optimization problems (COPs), such as the Flexible Job-Shop Scheduling Problem (FJSP), often involve complex, interdependent decisions over extended time frames, posing significant challenges for existing solvers. While Rolling Horizon Optimization (RHO) addresses this by decomposing problems into overlapping shorter-horizon subproblems, such overlap often involves redundant computations. In this paper, we present L-RHO, the first learning-guided RHO framework for COPs. L-RHO employs a neural network to intelligently fix variables that in hindsight did not need to be re-optimized, resulting in smaller and thus easier-to-solve subproblems. For FJSP, this means identifying operations with unchanged machine assignments between consecutive subproblems. Applied to FJSP, L-RHO accelerates RHO by up to 54% while significantly improving solution quality, outperforming other heuristic and learning-based baselines. We also provide in-depth discussions and verify the desirable adaptability and generalization of L-RHO across numerous FJSP variates, distributions, online scenarios and benchmark instances. Moreover, we provide a theoretical analysis to elucidate the conditions under which learning is beneficial.

Paper Structure

This paper contains 41 sections, 1 theorem, 20 equations, 10 figures, 20 tables, 8 algorithms.

Key Result

Proposition 1

Under Assump. assump:linear_decrease, the FN and FP errors for each method is given in closed-form as Furthermore, ignoring the $\frac{m}{2}$ term in $\mathbb{E}[n_{fix}^*]$ for ease of exposition, the FP and FN Rates of First $\sigma_F$ and Random $\sigma_R$ are $(\alpha_F, \beta_F) = (\frac{\sigma_F (1 - b + \frac{m}{2}\sigma_F)}{1 - b + \frac{m}{2}}, 1 - \frac{\sigma_F (b - \frac{m}{2}\sigma_F

Figures (10)

  • Figure 1: One iteration of our L-RHO pipeline. (a) Construct a subproblem $P_r$ with $H$ non-executed operations in $\mathcal{O}_{plan,r}$. This includes a set of overlapping operations $\mathcal{O}_{overlap,r} \subseteq \mathcal{O}_{plan, r}$, each associated with a solution from the previous iteration given by the assignment $m_{r-1}$ and the schedule $\pi_{r-1}$. (b) Identify operations $\mathcal{O}_{fix,r} \subseteq \mathcal{O}_{overlap,r}$ that in hindsight, their assignments did not need to be re-optimized; during training, a Look-Ahead Oracle determines $\mathcal{O}_{fix,r}$ by solving $P_r$ for $Q$ times; During inference, our learned neural network selects it. (c) Create a restricted subproblem $\hat{P}_r$ by fixing the machine assignments for $\mathcal{O}_{fix,r}$. (d) Feed $\hat{P}_r$ to a subsolver, solve for up to $T$ seconds, and (e) Obtain an updated solution $\Pi_r = (m_r, \pi_r)$. (f) Execute a subset of $S$ operations in $\mathcal{O}_{exec, r} \subseteq \mathcal{O}_{plan, r}$ based on the solution $\Pi_r$. We then repeat steps (a)-(f).
  • Figure 2: Our neural architecture $f_\theta$ to predict the probability of whether each overlapping operation should fix the assignment.
  • Figure 3: L-RHO under different FJSP variants.Left: Increased Congestion level with more jobs $|\mathcal{T}| \in \{30, 32, 34\}$. We circle the baselines for these three settings with green, yellow and red ellipsoids, using different markers to represent each setting. L-RHO is plotted in pink. Middle: Multi-objective and Noisy Observation (online). We analyze FJSP $(30, 30, \textcolor{black}{24})$ with the (i) start delay objective (ii) start and end delay objective, and (iii) start and end delay objective plus observation noise. The arrows illustrate the performance changes of each method across (i)-(ii)-(iii). Right: Machine Breakdown (online). We simulate machine breakdowns during RHO's process, varying intensity (low, mid, high) by adjusting event frequency and machine availability.
  • Figure 4: Left: We analyze FN and FP errors of each RHO method relative to Oracle and interpret their impact on objective and solve time. Middle: We show the FN and FP errors for Random and First with $\sigma_R, \sigma_F \in [0, 1]$ under Assump. \ref{['assump:linear_decrease']}, fixing $b - \frac{m}{2} = 0.5$ while increasing the slope $m$. Higher $m$ enhances the performance of First relative to Random (with equal $\sigma$) by reducing both errors. We plot $\sigma_R, \sigma_F \in \{0, 0.1, ..., 1\}$ using circles and squares, with darker colors for smaller values. Right: For FJSP $(25, 25, \textcolor{black}{24})$ with the total start delay objective, we depict the FN and FP errors of {Random, First, and L-RHO} in the low FP region, using an empirically validated $p^*_{fix}(i)$ distribution. By reducing both $\mathbb{E}[n_{fp}]$ and $\mathbb{E}[n_{fn}]$, L-RHO empirically outperforms First $\sigma_F \leq 60\%$ in solve time and all baselines in objective (pink region). Transforming the coordinates (right and top axes) can further reveal the FP and FN rates L-RHO should achieve for learning to be effective.
  • Figure 5: The Large Neighborhood Search (LNS) Pipeline from pacino2011large. (a) At the $r^{th}$ LNS iteration, we start with a complete solution $\Pi$ for the full FJSP. The solution at the first iteration is given by solving the full FSJP for a short duration (30s). (b) We construct a FJSP subproblem by selecting a subset of FJSP operations to update their solution. Two subproblem selection methods are considered based on time and machine based decomposition methods. When constructing the FJSP subproblem, we include additional constraints on the operation’s start time and each machine’s available time so that the subproblem’s solution is compatible with the solution of the non-selected FJSP operations. (c) We feed the FJSP subproblem into a subsolver to get a new solution $\Pi_r$. The old solution of the subproblem is given to warm start the solve, as it empirically improves the performance. (d) We update the complete solution $\Pi$ with the new subproblem’s solution $\Pi_r$, and repeat (a)-(d).
  • ...and 5 more figures

Theorems & Definitions (3)

  • Proposition 1
  • proof
  • proof