Table of Contents
Fetching ...

Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning

Martin Asenov, Qiwen Deng, Gingfung Yeung, Adam Barker

TL;DR

This paper proposes a reinforcement learning approach for learning the weights in scheduler scoring algorithms with the overall objective of improving the end-to-end performance of jobs for a given cluster, based on percentage improvement reward, frame-stacking, and limiting domain information.

Abstract

Efficiently allocating incoming jobs to nodes in large-scale clusters can lead to substantial improvements in both cluster utilization and job performance. In order to allocate incoming jobs, cluster schedulers usually rely on a set of scoring functions to rank feasible nodes. Results from individual scoring functions are usually weighted equally, which could lead to sub-optimal deployments as the one-size-fits-all solution does not take into account the characteristics of each workload. Tuning the weights of scoring functions, however, requires expert knowledge and is computationally expensive. This paper proposes a reinforcement learning approach for learning the weights in scheduler scoring algorithms with the overall objective of improving the end-to-end performance of jobs for a given cluster. Our approach is based on percentage improvement reward, frame-stacking, and limiting domain information. We propose a percentage improvement reward to address the objective of multi-step parameter tuning. The inclusion of frame-stacking allows for carrying information across an optimization experiment. Limiting domain information prevents overfitting and improves performance in unseen clusters and workloads. The policy is trained on different combinations of workloads and cluster setups. We demonstrate the proposed approach improves performance on average by 33\% compared to fixed weights and 12\% compared to the best-performing baseline in a lab-based serverless scenario.

Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning

TL;DR

This paper proposes a reinforcement learning approach for learning the weights in scheduler scoring algorithms with the overall objective of improving the end-to-end performance of jobs for a given cluster, based on percentage improvement reward, frame-stacking, and limiting domain information.

Abstract

Efficiently allocating incoming jobs to nodes in large-scale clusters can lead to substantial improvements in both cluster utilization and job performance. In order to allocate incoming jobs, cluster schedulers usually rely on a set of scoring functions to rank feasible nodes. Results from individual scoring functions are usually weighted equally, which could lead to sub-optimal deployments as the one-size-fits-all solution does not take into account the characteristics of each workload. Tuning the weights of scoring functions, however, requires expert knowledge and is computationally expensive. This paper proposes a reinforcement learning approach for learning the weights in scheduler scoring algorithms with the overall objective of improving the end-to-end performance of jobs for a given cluster. Our approach is based on percentage improvement reward, frame-stacking, and limiting domain information. We propose a percentage improvement reward to address the objective of multi-step parameter tuning. The inclusion of frame-stacking allows for carrying information across an optimization experiment. Limiting domain information prevents overfitting and improves performance in unseen clusters and workloads. The policy is trained on different combinations of workloads and cluster setups. We demonstrate the proposed approach improves performance on average by 33\% compared to fixed weights and 12\% compared to the best-performing baseline in a lab-based serverless scenario.
Paper Structure (24 sections, 2 equations, 7 figures, 2 tables)

This paper contains 24 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Filtering and scoring steps in a job scheduler. Assigning pods to nodes in a cluster job scheduler is typically a two-step process of filtering feasible nodes, followed by scoring functions. In this work, we focus on optimizing the relative weighting (w1, w2, w3, ... ,wk) of the different scoring functions in different cluster and workload scenarios, with the goal of optimizing a given metric.
  • Figure 2: Reinforcement learning for tuning weights of scoring functions. We pose the optimization of weights of scoring functions as a parameter tuning problem and propose a reinforcement learning based solution. In this work, we propose using percentage improvement reward , encoding past samples information through the use of frame stacking or recurrent policies , and limiting domain information to prevent overfitting . We develop an extensive gym wrapper , including the option for parallel environments , and demonstrate the capability of our approach in an example FaaS benchmark scenario .
  • Figure 3: Different heterogeneous cluster configurations used for training and evaluation. Distributions of the types of machines used for benchmark experiments. Only cloud_cpu, cloud_gpu and edge_cloudlet cluster configurations are used during training. We use additional cluster configurations to evaluate how well the proposed approach is able to adapt to unseen machines' distributions.
  • Figure 4: Example network configurations within the cluster setup. We use two types of cluster connectivity across benchmark experiments.
  • Figure 5: Tuning weights of scoring functions on similar cluster and workload configurations. Example results for six experiments visualized across two columns. For each experiment the following three characteristics are described (from left to right): best score (as defined in eq. \ref{['eq:score']}) from the set of explored weights' configurations; mean and standard deviation, best weights selection from the reinforcement learning algorithm; short description of the experiment. We compare the proposed approach against four baselines, including fixed weights (Fix), random search (RS), Bayesian Optimization (BO), and Tree-structured Parzen Estimator (TPE). In each experiment, the fixed weight configuration was used as an initial sample (same as Fix), followed by four optimization steps. A total of eight scoring functions were used.
  • ...and 2 more figures