Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine Scheduling
Nikolai Antonov, Prěmysl Šůcha, Mikoláš Janota, Jan Hůla
TL;DR
This work tackles the strongly NP-hard single-machine scheduling problem $1|\tilde{d}_i|\sum w_i U_i$, reframing it as maximizing the weight of early jobs under hard deadlines and proposing a data-driven heuristic that guarantees feasibility. The approach combines an ML oracle to label jobs as early or tardy, ILP-based refinement via Baptiste’s reduction for uncertain predictions, and an EDF-based scheduling algorithm to produce feasible schedules. It advances model design by developing robust instance-aware features and comparing globally and locally informed ML models, with the multilayer perceptron (MLP) offering a favorable accuracy-speed trade-off. Empirical results across 15 diverse datasets show our method yields substantially smaller optimality gaps and higher rates of optimal solutions than state-of-the-art heuristics, while maintaining practical runtimes, thereby offering a robust, scalable solution for real-world scheduling tasks and guiding future data-driven scheduling research.
Abstract
Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.
