Scalable Reinforcement Learning for Virtual Machine Scheduling
Junjie Sheng, Jiehao Wu, Haochuan Cui, Yiqiu Hu, Wenli Zhou, Lei Zhu, Qian Peng, Wenhao Li, Xiangfeng Wang
TL;DR
This work tackles the scalability barrier of applying reinforcement learning to virtual machine scheduling (VMS) in large cloud clusters. It introduces Cluster Value Decomposition Reinforcement Learning (CVD-RL), which decomposes cluster value into per-PM components, uses a look-ahead mechanism to simplify state representations, and constrains exploration with a Top-k action filter. The framework achieves scalable learning that remains effective as the number of PMs grows (up to 50 PMs in Huawei Cloud data) and demonstrates strong generalization across different warm-starts, PM counts, and expansion scenarios, outperforming several state-of-the-art baselines in key metrics like scheduled length, CPU utilization, and income. By enabling near-optimal, scalable RL for VMS in large-scale cloud environments, CVD-RL offers a practical path toward deploying RL-based resource management in real-world data centers, with potential applicability to other multi-dimensional scheduling problems.
Abstract
Recent advancements in reinforcement learning (RL) have shown promise for optimizing virtual machine scheduling (VMS) in small-scale clusters. The utilization of RL to large-scale cloud computing scenarios remains notably constrained. This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL), to surmount the scalability hurdles inherent in large-scale VMS. The CVD-RL framework innovatively combines a decomposition operator with a look-ahead operator to adeptly manage representation complexities, while complemented by a Top-$k$ filter operator that refines exploration efficiency. Different from existing approaches limited to clusters of $10$ or fewer physical machines (PMs), CVD-RL extends its applicability to environments encompassing up to $50$ PMs. Furthermore, the CVD-RL framework demonstrates generalization capabilities that surpass contemporary SOTA methodologies across a variety of scenarios in empirical studies. This breakthrough not only showcases the framework's exceptional scalability and performance but also represents a significant leap in the application of RL for VMS within complex, large-scale cloud infrastructures. The code is available at https://anonymous.4open.science/r/marl4sche-D0FE.
