ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection
Shijie Cao, Yuan Yuan
TL;DR
ReflecSched reframes LLMs from purely reactive schedulers to strategic analysts for Dynamic Flexible Job-Shop Scheduling (DFJSP), addressing the long-context, heuristic underutilization, and myopic greed pitfalls. It introduces Hierarchical Reflection to run multi-level, heuristic-driven simulations and distill a concise Strategic Experience that informs a lean, Experience-Guided Decision-Making step, connecting to Approximate Policy Iteration in practice. Empirical results show ReflecSched substantially outperforms direct LLM baselines, rivals oracle-like heuristics, and achieves strong token-efficiency on larger problems, with best variants reaching an average Relative Percent Deviation (RPD) of 6.04% and an average rank of 3.18, along with a 71.35% Win Rate against the direct baseline and a 15.1% improvement in token efficiency on Normal-scale problems. The work demonstrates a robust, generalizable planning framework for LLM-based sequential decision tasks, potentially extending to other dynamic optimization domains via its Simulate-Reflect-Refine paradigm and Strategic Experience.
Abstract
The NP-hard Dynamic Flexible Job-Shop Scheduling (DFJSP) problem involves real-time events and complex routing. While traditional rules are efficient but rigid, deep learning is opaque and requires feature engineering. Large Language Models (LLMs) promise adaptive reasoning without this engineering overhead, yet we find their direct application is suboptimal. Baseline LLMs suffer from three key pitfalls: the long-context paradox, where crucial data is underutilized; an underutilization of expert heuristics; and myopic decision-making. To address this, we propose ReflecSched, a framework that empowers the LLM beyond a direct scheduler by equipping it with a strategic analysis capability. ReflecSched tasks the LLM to analyze heuristic-driven simulations across multiple planning horizons and distill them into a concise, natural-language summary termed ``Strategic Experience''. This summary is then integrated into the prompt of a final decision-making module, guiding it to produce non-myopic actions. Experiments demonstrate ReflecSched achieves superior performance, with its best variants attaining an average RPD of 6.04\% and rank of 3.18, significantly outperforming strong traditional and learning-based methods. It also statistically and decisively surpasses direct LLM baselines, securing a 71.35\% Win Rate while being, on average, 15.1\% more token-efficient on Normal-scale problems. Ablation studies attribute this performance to a robust reflection mechanism that leverages high-quality, contrastive experience. This mechanism mitigates key LLM pitfalls like myopic greed, enabling ReflecSched to outperform all evaluated heuristics. Ultimately, the framework's performance is statistically on par with an oracle-like strategy, showcasing its effectiveness and robustness.
