Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Sirui Li, Wenbin Ouyang, Yining Ma, Cathy Wu
TL;DR
This work tackles long-horizon COPs, specifically FJSP, by introducing Learning-Guided Rolling Horizon Optimization (L-RHO), a framework that learns to fix overlapping decisions across consecutive RHO iterations to shrink subproblems. A Look-Ahead Oracle provides training labels for a neural network $f_\theta$ that predicts which overlapping operations should be fixed, enabling a restricted subproblem $\hat{P}_r$ that accelerates solving CP-SAT-based FJSP subproblems. Empirically, L-RHO achieves up to 54% reduction in solve time and up to 21% improvement in solution quality on canonical offline FJSP, and demonstrates robustness across online variants, congestion, noise, and breakdowns, supported by a probabilistic analysis of FP/FN trade-offs. The approach offers a scalable, data-efficient path to accelerate long-horizon COPs with potential applicability beyond FJSP, and provides theoretical criteria to gauge when learning-based RHO will be beneficial.
Abstract
Long-horizon combinatorial optimization problems (COPs), such as the Flexible Job-Shop Scheduling Problem (FJSP), often involve complex, interdependent decisions over extended time frames, posing significant challenges for existing solvers. While Rolling Horizon Optimization (RHO) addresses this by decomposing problems into overlapping shorter-horizon subproblems, such overlap often involves redundant computations. In this paper, we present L-RHO, the first learning-guided RHO framework for COPs. L-RHO employs a neural network to intelligently fix variables that in hindsight did not need to be re-optimized, resulting in smaller and thus easier-to-solve subproblems. For FJSP, this means identifying operations with unchanged machine assignments between consecutive subproblems. Applied to FJSP, L-RHO accelerates RHO by up to 54% while significantly improving solution quality, outperforming other heuristic and learning-based baselines. We also provide in-depth discussions and verify the desirable adaptability and generalization of L-RHO across numerous FJSP variates, distributions, online scenarios and benchmark instances. Moreover, we provide a theoretical analysis to elucidate the conditions under which learning is beneficial.
