Table of Contents
Fetching ...

ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection

Shijie Cao, Yuan Yuan

TL;DR

ReflecSched reframes LLMs from purely reactive schedulers to strategic analysts for Dynamic Flexible Job-Shop Scheduling (DFJSP), addressing the long-context, heuristic underutilization, and myopic greed pitfalls. It introduces Hierarchical Reflection to run multi-level, heuristic-driven simulations and distill a concise Strategic Experience that informs a lean, Experience-Guided Decision-Making step, connecting to Approximate Policy Iteration in practice. Empirical results show ReflecSched substantially outperforms direct LLM baselines, rivals oracle-like heuristics, and achieves strong token-efficiency on larger problems, with best variants reaching an average Relative Percent Deviation (RPD) of 6.04% and an average rank of 3.18, along with a 71.35% Win Rate against the direct baseline and a 15.1% improvement in token efficiency on Normal-scale problems. The work demonstrates a robust, generalizable planning framework for LLM-based sequential decision tasks, potentially extending to other dynamic optimization domains via its Simulate-Reflect-Refine paradigm and Strategic Experience.

Abstract

The NP-hard Dynamic Flexible Job-Shop Scheduling (DFJSP) problem involves real-time events and complex routing. While traditional rules are efficient but rigid, deep learning is opaque and requires feature engineering. Large Language Models (LLMs) promise adaptive reasoning without this engineering overhead, yet we find their direct application is suboptimal. Baseline LLMs suffer from three key pitfalls: the long-context paradox, where crucial data is underutilized; an underutilization of expert heuristics; and myopic decision-making. To address this, we propose ReflecSched, a framework that empowers the LLM beyond a direct scheduler by equipping it with a strategic analysis capability. ReflecSched tasks the LLM to analyze heuristic-driven simulations across multiple planning horizons and distill them into a concise, natural-language summary termed ``Strategic Experience''. This summary is then integrated into the prompt of a final decision-making module, guiding it to produce non-myopic actions. Experiments demonstrate ReflecSched achieves superior performance, with its best variants attaining an average RPD of 6.04\% and rank of 3.18, significantly outperforming strong traditional and learning-based methods. It also statistically and decisively surpasses direct LLM baselines, securing a 71.35\% Win Rate while being, on average, 15.1\% more token-efficient on Normal-scale problems. Ablation studies attribute this performance to a robust reflection mechanism that leverages high-quality, contrastive experience. This mechanism mitigates key LLM pitfalls like myopic greed, enabling ReflecSched to outperform all evaluated heuristics. Ultimately, the framework's performance is statistically on par with an oracle-like strategy, showcasing its effectiveness and robustness.

ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection

TL;DR

ReflecSched reframes LLMs from purely reactive schedulers to strategic analysts for Dynamic Flexible Job-Shop Scheduling (DFJSP), addressing the long-context, heuristic underutilization, and myopic greed pitfalls. It introduces Hierarchical Reflection to run multi-level, heuristic-driven simulations and distill a concise Strategic Experience that informs a lean, Experience-Guided Decision-Making step, connecting to Approximate Policy Iteration in practice. Empirical results show ReflecSched substantially outperforms direct LLM baselines, rivals oracle-like heuristics, and achieves strong token-efficiency on larger problems, with best variants reaching an average Relative Percent Deviation (RPD) of 6.04% and an average rank of 3.18, along with a 71.35% Win Rate against the direct baseline and a 15.1% improvement in token efficiency on Normal-scale problems. The work demonstrates a robust, generalizable planning framework for LLM-based sequential decision tasks, potentially extending to other dynamic optimization domains via its Simulate-Reflect-Refine paradigm and Strategic Experience.

Abstract

The NP-hard Dynamic Flexible Job-Shop Scheduling (DFJSP) problem involves real-time events and complex routing. While traditional rules are efficient but rigid, deep learning is opaque and requires feature engineering. Large Language Models (LLMs) promise adaptive reasoning without this engineering overhead, yet we find their direct application is suboptimal. Baseline LLMs suffer from three key pitfalls: the long-context paradox, where crucial data is underutilized; an underutilization of expert heuristics; and myopic decision-making. To address this, we propose ReflecSched, a framework that empowers the LLM beyond a direct scheduler by equipping it with a strategic analysis capability. ReflecSched tasks the LLM to analyze heuristic-driven simulations across multiple planning horizons and distill them into a concise, natural-language summary termed ``Strategic Experience''. This summary is then integrated into the prompt of a final decision-making module, guiding it to produce non-myopic actions. Experiments demonstrate ReflecSched achieves superior performance, with its best variants attaining an average RPD of 6.04\% and rank of 3.18, significantly outperforming strong traditional and learning-based methods. It also statistically and decisively surpasses direct LLM baselines, securing a 71.35\% Win Rate while being, on average, 15.1\% more token-efficient on Normal-scale problems. Ablation studies attribute this performance to a robust reflection mechanism that leverages high-quality, contrastive experience. This mechanism mitigates key LLM pitfalls like myopic greed, enabling ReflecSched to outperform all evaluated heuristics. Ultimately, the framework's performance is statistically on par with an oracle-like strategy, showcasing its effectiveness and robustness.

Paper Structure

This paper contains 76 sections, 1 theorem, 27 equations, 9 figures, 5 tables.

Key Result

Proposition 1

Let $\pi_{\mathcal{E}_t}$ be the policy defined above and $V^{\pi}(S_t)$ be the expected makespan from state $S_t$ under policy $\pi$. If the assumptions of Cost Function Approximation and Faithful Reflection hold, then for all states $S_t$, the expected performance of $\pi_{\mathcal{E}_t}$ is no wo

Figures (9)

  • Figure 3.1: An example of two valid schedules for the same problem instance. The Gantt charts illustrate how a locally optimal choice at an early stage can lead to a suboptimal final makespan (top, 7.51), whereas a more globally-aware decision sequence yields a better outcome (bottom, 6.51).
  • Figure 4.1: An empirical investigation of the Long-Context Paradox. (a) A breakdown of the prompt composition for the baseline model. (b) Box plots showing the ratio of makespans resulting from running the model with and without the static data portion of the prompt.
  • Figure 4.2: Analysis of heuristic utilization on the PDR-Bench dataset, with RPD measured against the known optimal heuristic for each instance (0% baseline). The baseline LLM struggles to apply the optimal heuristic even when explicitly prompted, whereas ReflecSched substantially improves performance.
  • Figure 4.3: Comparison of the Average Greedy Decision Ratio (GDR) between the LLM-Direct baseline and the ReflecSched framework. Results are presented for different models and are segmented by the Normal and Small problem scales.
  • Figure 5.1: The architecture of the ReflecSched framework, designed to decouple strategic planning from immediate execution. The main workflow is initiated at a decision point. The Hierarchical Reflection Module (blue) performs multi-level, heuristic-driven simulations to explore future state trajectories. It then leverages an LLM to analyze these outcomes and distill a concise Strategic Experience ($\mathcal{E}$). This experience is passed to the Experience-Guided Decision-Making Module (green), which constructs an experience-augmented prompt to guide the LLM in selecting a final, strategically-informed action. The right-hand panels provide concrete examples of the prompts and responses for both the reflection and decision-making stages, illustrating the flow of information from simulation data to strategic guidance, and finally to a specific action.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Proposition 1: Conditional Policy Improvement