Table of Contents
Fetching ...

Efficient Multi-Processor Scheduling in Increasingly Realistic Models

Pál András Papp, Georg Anegg, Aikaterini Karanasiou, A. N. Yzelman

TL;DR

This work tackles the problem of scheduling general DAG workloads on multi-processor systems under a realistic BSP model extended with NUMA costs. It develops a unified framework that combines BSP-specific initialization heuristics, hill-climbing local search, multiple ILP formulations, and a multilevel coarsening strategy to minimize the total schedule cost, $C(s)=C_{work}(s)+g\cdot C_{comm}(s)+\\ell$. Empirically, the framework substantially outperforms strong baselines (e.g., $24\%$–$44\%$ cost reductions without NUMA and up to $2.5\times$ improvements with NUMA; multilevel gains reach nearly $5\times$ in certain high-communication regimes) on a diverse DAG database. The results highlight the value of incorporating realistic communication and memory-access costs in scheduling decisions and demonstrate the potential for heavy optimization to yield practical performance benefits despite higher runtime overheads. The work also provides a public DAG database and an extensible scheduling framework, offering a foundation for future improvements and broader application in areas such as machine learning and irregular workloads.

Abstract

We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the baselines, and in case of NUMA effects, it achieves up to a factor $2.5\times$ improvement compared to the baselines. Finally, we also develop a multilevel scheduling algorithm, which provides up to almost a factor $5\times$ improvement in the special case when the problem is dominated by very high communication costs.

Efficient Multi-Processor Scheduling in Increasingly Realistic Models

TL;DR

This work tackles the problem of scheduling general DAG workloads on multi-processor systems under a realistic BSP model extended with NUMA costs. It develops a unified framework that combines BSP-specific initialization heuristics, hill-climbing local search, multiple ILP formulations, and a multilevel coarsening strategy to minimize the total schedule cost, . Empirically, the framework substantially outperforms strong baselines (e.g., cost reductions without NUMA and up to improvements with NUMA; multilevel gains reach nearly in certain high-communication regimes) on a diverse DAG database. The results highlight the value of incorporating realistic communication and memory-access costs in scheduling decisions and demonstrate the potential for heavy optimization to yield practical performance benefits despite higher runtime overheads. The work also provides a public DAG database and an extensible scheduling framework, offering a foundation for future improvements and broader application in areas such as machine learning and irregular workloads.

Abstract

We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the baselines, and in case of NUMA effects, it achieves up to a factor improvement compared to the baselines. Finally, we also develop a multilevel scheduling algorithm, which provides up to almost a factor improvement in the special case when the problem is dominated by very high communication costs.
Paper Structure (38 sections, 5 equations, 7 figures, 14 tables, 2 algorithms)

This paper contains 38 sections, 5 equations, 7 figures, 14 tables, 2 algorithms.

Figures (7)

  • Figure 1: Example BSP scheduling of a DAG.
  • Figure 2: Coarse-grained and fine-grained DAG representation of a simple matrix-vector multiplication.
  • Figure 3: Summary of our scheduling framework.
  • Figure 4: Summary of our multilevel framework. Base scheduler refers to the pipeline in Figure \ref{['fig:pipeline1']} (without ILPcs).
  • Figure 5: Performance comparison of Cilk, HDagg and our scheduling algorithms without NUMA effects, for values $g \in \{1,3,5\}$.
  • ...and 2 more figures