Towards VM Rescheduling Optimization Through Deep Reinforcement Learning
Xianzhong Ding, Yunkai Zhang, Binbin Chen, Donghao Ying, Tieying Zhang, Jianjun Chen, Lei Zhang, Alberto Cerpa, Wan Du
TL;DR
The paper tackles VM rescheduling in data centers under stringent inference-time constraints by formulating an RL-based solution, VMR$^2$L, that uses a two-stage action decomposition, sparse attention for scalable relational state representations, and risk-seeking evaluation to trade latency for solution quality. It demonstrates that VMR$^2$L can achieve FR close to a near-optimal MIP while delivering decisions in seconds, vastly outperforming heuristic baselines and traditional optimization approaches in large-scale settings. The authors provide extensive evaluations across real datasets, multiple constraints, mixed objectives, and broad generalization scenarios, and they release datasets and an RL gym environment to facilitate further research. The practical impact is a scalable, adaptable VM rescheduling framework capable of reducing fragmentation in industrial data centers without sacrificing latency budgets, with direct applicability to production environments and potential for broader system optimization use.
Abstract
Modern industry-scale data centers need to manage a large number of virtual machines (VMs). Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs, a practice commonly referred to as VM rescheduling. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, due to dynamic VM state changes during this period. This causes existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VM2RL, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions, a feature extraction module that captures relational information specific to rescheduling, as well as a risk-seeking evaluation enabling users to optimize the trade-off between latency and accuracy. We conduct extensive experiments with data from an industry-scale data center. Our results show that VM2RL can achieve a performance comparable to the optimal solution but with a running time of seconds. Code and datasets are open-sourced: https://github.com/zhykoties/VMR2L_eurosys, https://drive.google.com/drive/folders/1PfRo1cVwuhH30XhsE2Np3xqJn2GpX5qy.
