Efficient Multi-Processor Scheduling in Increasingly Realistic Models
Pál András Papp, Georg Anegg, Aikaterini Karanasiou, A. N. Yzelman
TL;DR
This work tackles the problem of scheduling general DAG workloads on multi-processor systems under a realistic BSP model extended with NUMA costs. It develops a unified framework that combines BSP-specific initialization heuristics, hill-climbing local search, multiple ILP formulations, and a multilevel coarsening strategy to minimize the total schedule cost, $C(s)=C_{work}(s)+g\cdot C_{comm}(s)+\\ell$. Empirically, the framework substantially outperforms strong baselines (e.g., $24\%$–$44\%$ cost reductions without NUMA and up to $2.5\times$ improvements with NUMA; multilevel gains reach nearly $5\times$ in certain high-communication regimes) on a diverse DAG database. The results highlight the value of incorporating realistic communication and memory-access costs in scheduling decisions and demonstrate the potential for heavy optimization to yield practical performance benefits despite higher runtime overheads. The work also provides a public DAG database and an extensible scheduling framework, offering a foundation for future improvements and broader application in areas such as machine learning and irregular workloads.
Abstract
We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the baselines, and in case of NUMA effects, it achieves up to a factor $2.5\times$ improvement compared to the baselines. Finally, we also develop a multilevel scheduling algorithm, which provides up to almost a factor $5\times$ improvement in the special case when the problem is dominated by very high communication costs.
