Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

Chang-Lin Chen; Hanhan Zhou; Jiayu Chen; Mohammad Pedramfar; Tian Lan; Zheqing Zhu; Chi Zhou; Pol Mauri Ruiz; Neeraj Kumar; Hongbo Dong; Vaneet Aggarwal

Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

Chang-Lin Chen, Hanhan Zhou, Jiayu Chen, Mohammad Pedramfar, Tian Lan, Zheqing Zhu, Chi Zhou, Pol Mauri Ruiz, Neeraj Kumar, Hongbo Dong, Vaneet Aggarwal

TL;DR

This paper presents a novel two-tiered online optimization to enable a learning-based Resource Allowance System (RAS) and trains a decision tree model to explain the learned policies and to prune unreasonable corner cases at the low-level MILP solver, resulting in further performance improvement.

Abstract

Online optimization of resource management for large-scale data centers and infrastructures to meet dynamic capacity reservation demands and various practical constraints (e.g., feasibility and robustness) is a very challenging problem. Mixed Integer Programming (MIP) approaches suffer from recognized limitations in such a dynamic environment, while learning-based approaches may face with prohibitively large state/action spaces. To this end, this paper presents a novel two-tiered online optimization to enable a learning-based Resource Allowance System (RAS). To solve optimal server-to-reservation assignment in RAS in an online fashion, the proposed solution leverages a reinforcement learning (RL) agent to make high-level decisions, e.g., how much resource to select from the Main Switch Boards (MSBs), and then a low-level Mixed Integer Linear Programming (MILP) solver to generate the local server-to-reservation mapping, conditioned on the RL decisions. We take into account fault tolerance, server movement minimization, and network affinity requirements and apply the proposed solution to large-scale RAS problems. To provide interpretability, we further train a decision tree model to explain the learned policies and to prune unreasonable corner cases at the low-level MILP solver, resulting in further performance improvement. Extensive evaluations show that our two-tiered solution outperforms baselines such as pure MIP solver by over $15\%$ while delivering $100\times$ speedup in computation.

Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

TL;DR

Abstract

while delivering

speedup in computation.

Paper Structure (30 sections, 15 equations, 9 figures, 4 tables, 5 algorithms)

This paper contains 30 sections, 15 equations, 9 figures, 4 tables, 5 algorithms.

Introduction
Summary of Contributions
System Model
Server Assignment Constraint
Server Movement Cost
Outside Rack and MSB Spread Goals
Cost of Largest Failure Domain
Capacity Guarantee
Network Affinity Requirements
Problem Formulation
Proposed Framework Design
Framework
PPO Agent
Action Converter
Integration with the Low-level MILP
...and 15 more sections

Figures (9)

Figure 1: Depiction of dynamic Resource Allowance System considered in this paper, as well as a brief comparison of our two-tier approach, MIP solver, and the complete RL approach. The capacity demand of a reservation is an aggregation of the capacity requests over time.
Figure 2: A description of our proposed framework, where the PPO agent makes decisions for every reservation and then transforms into a number of servers to take from the MSBs by the action converter. Finally, the number of servers to take from the MSBs for all reservations is converted into server-to-reservation mapping by the low-level MILP.
Figure 3: Performance comparison with regards to the objective value ($\mathcal{U}(t)$). Lower objective value and outside MSB goals are desired. The results show that our proposed algorithms outperform the other algorithms in the sense that (a) (b) they result in the lowest objective value across all the percentile, and (c) their average remains the lowest within an episode.
Figure 4: The empirical CDF comparison concludes that (a) the proportional and uniform baselines induce the least server movement cost, (b) our proposed algorithms use the least resource outside rack spread goal, and (c) all the algorithms perform equally well in maintaining the size of the largest MSB.
Figure 5: We conclude that the resource utilization of our proposed algorithms is the most efficient and reliable by (a) their capacity redundancy is the lowest across the percentile, (b) their average capacity redundancy is the lowest for all time steps in an episode, and (c) they do not violate the redundancy constraint.
...and 4 more figures

Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

TL;DR

Abstract

Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)