Eva: Cost-Efficient Cloud-Based Cluster Scheduling
Tzu-Tao Chang, Shivaram Venkataraman
TL;DR
This work tackles cost-efficient hosting of batch jobs on cloud-based clusters by jointly optimizing task placement and instance provisioning. It introduces Eva, a reservation-price–based scheduler that accounts for co-location interference and migration overhead via Full and Partial Reconfiguration, implemented in a modular master-worker system with a simulator. Empirical results across physical AWS experiments and large-scale Alibaba traces show substantial cost reductions (up to 42%) with modest increases in JCT (around 15%), demonstrating that coordinated packing and provisioning can outperform isolated, per-task provisioning. The approach offers practical benefits for cloud data centers hosting heterogeneous workloads, providing a scalable, interference-aware framework for dynamic cluster reconfiguration.
Abstract
Cloud computing offers flexibility in resource provisioning, allowing an organization to host its batch processing workloads cost-efficiently by dynamically scaling the size and composition of a cloud-based cluster -- a collection of instances provisioned from the cloud. However, existing schedulers fail to minimize total cost due to suboptimal task and instance scheduling strategies, interference between co-located tasks, and instance provisioning overheads. We present Eva, a scheduler for cloud-based clusters that reduces the overall cost of hosting long-running batch jobs. Eva leverages reservation price from economics to derive the optimal set of instances to provision and task-to-instance assignments. Eva also takes into account performance degradation when co-locating tasks and quantitatively evaluates the trade-off between short-term migration overhead and long-term provision savings when considering a change in cluster configuration. Experiments on AWS EC2 and large-scale trace-driven simulations demonstrate that Eva reduces costs by 42\% while incurring only a 15\% increase in JCT, compared to provisioning a separate instance for each task.
