Table of Contents
Fetching ...

Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine Scheduling

Nikolai Antonov, Prěmysl Šůcha, Mikoláš Janota, Jan Hůla

TL;DR

This work tackles the strongly NP-hard single-machine scheduling problem $1|\tilde{d}_i|\sum w_i U_i$, reframing it as maximizing the weight of early jobs under hard deadlines and proposing a data-driven heuristic that guarantees feasibility. The approach combines an ML oracle to label jobs as early or tardy, ILP-based refinement via Baptiste’s reduction for uncertain predictions, and an EDF-based scheduling algorithm to produce feasible schedules. It advances model design by developing robust instance-aware features and comparing globally and locally informed ML models, with the multilayer perceptron (MLP) offering a favorable accuracy-speed trade-off. Empirical results across 15 diverse datasets show our method yields substantially smaller optimality gaps and higher rates of optimal solutions than state-of-the-art heuristics, while maintaining practical runtimes, thereby offering a robust, scalable solution for real-world scheduling tasks and guiding future data-driven scheduling research.

Abstract

Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.

Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine Scheduling

TL;DR

This work tackles the strongly NP-hard single-machine scheduling problem , reframing it as maximizing the weight of early jobs under hard deadlines and proposing a data-driven heuristic that guarantees feasibility. The approach combines an ML oracle to label jobs as early or tardy, ILP-based refinement via Baptiste’s reduction for uncertain predictions, and an EDF-based scheduling algorithm to produce feasible schedules. It advances model design by developing robust instance-aware features and comparing globally and locally informed ML models, with the multilayer perceptron (MLP) offering a favorable accuracy-speed trade-off. Empirical results across 15 diverse datasets show our method yields substantially smaller optimality gaps and higher rates of optimal solutions than state-of-the-art heuristics, while maintaining practical runtimes, thereby offering a robust, scalable solution for real-world scheduling tasks and guiding future data-driven scheduling research.

Abstract

Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.

Paper Structure

This paper contains 28 sections, 3 theorems, 8 equations, 7 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

There exists a feasible schedule $s$ with an early set of jobs $E_s$ if and only if there exists a feasible schedule $s^\prime$ with an early set of jobs $E_s^\prime = E_s \setminus \{ j \}$ for the reduced problem.

Figures (7)

  • Figure 1: Overview of the proposed approach. We begin by using an ML-based oracle to predict jobs as early or tardy. Then, ILP is applied to refine some of these predictions. Finally, a feasibility framework generates a schedule based on the refined predictions.
  • Figure 2: Error frequency (left) and distribution of predicted probabilities (right).
  • Figure 3: Accuracy of MLP trained on three feature representations across selected datasets.
  • Figure 4: Comparison of average optimality gap (log scale): proposed method vs. best alternative across datasets.
  • Figure 5: Comparison of the average number of optimal solutions: proposed method vs. best alternative across datasets.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Theorem 1: Reduction theorem
  • Proposition 1
  • proof
  • Remark 1
  • Theorem 2: Dominance rule