The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture
Mert Yildiz, Alexey Rolich, Andrea Baiocchi
TL;DR
The paper investigates how to minimize mean job response time in large-scale clusters under realistic, heterogeneous workloads by evaluating dispatching and scheduling strategies using Google's Borg traces. It introduces a two-stage dispatching architecture that partitions tasks into short and long jobs, combined with simple policies like Round Robin, and demonstrates that this architectural approach can outperform single-stage configurations and even more sophisticated policies under many conditions. A key finding is that mean response time can exhibit a non-monotonic relationship with the number of servers when the total computational budget is fixed, and that parallelism and architectural design can be more effective levers than policy complexity. The results highlight practical guidance for data-center design, suggesting that careful system architecture and controlled parallelism can achieve substantial performance improvements in real-world traffic without resorting to highly complex scheduling policies.
Abstract
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.
