Table of Contents
Fetching ...

Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling

Tin Ping Chan, Yunlong Cheng, Yizhan Zhu, Xiaofeng Gao, Guihai Chen

TL;DR

The paper tackles DVAMP, a dynamic VM allocation problem in multi-NUMA PMs, by formulating offline and online MILP models and deriving a tight greedy-online performance bound expressed as $\frac{m-1}{2m-1}(\mu-1)\cdot TR$ with $TR = \sum_{d=1}^D \sum_{j=1}^n r^d_j lt_j$. It then proposes SPANE, a symmetry-preserving DRL framework built on a three-module architecture (Embedding Generation, Value, Advantage) to exploit permutation invariance across PMs, improving learning efficiency and solution quality. Extensive Huawei-East-1 trace experiments show SPANE reduces average VM wait time by 45% over baselines and demonstrates strong cross-PM-count generalization without retraining. These results advance NUMA-aware cloud resource management by providing rigorous theory and practical, scalable techniques for dynamic VM scheduling in real-world data centers.

Abstract

As cloud computing continues to evolve, the adoption of multi-NUMA (Non-Uniform Memory Access) architecture by cloud service providers has introduced new challenges in virtual machine (VM) scheduling. To address these challenges and more accurately reflect the complexities faced by modern cloud environments, we introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP). We formally define both offline and online versions of DVAMP as mixed-integer linear programming problems, providing a rigorous mathematical foundation for analysis. A tight performance bound for greedy online algorithms is derived, offering insights into the worst-case optimality gap as a function of the number of physical machines and VM lifetime variability. To address the challenges posed by DVAMP, we propose SPANE (Symmetry-Preserving Architecture for Multi-NUMA Environments), a novel deep reinforcement learning approach that exploits the problem's inherent symmetries. SPANE produces invariant results under arbitrary permutations of physical machine states, enhancing learning efficiency and solution quality. Extensive experiments conducted on the Huawei-East-1 dataset demonstrate that SPANE outperforms existing baselines, reducing average VM wait time by 45%. Our work contributes to the field of cloud resource management by providing both theoretical insights and practical solutions for VM scheduling in multi-NUMA environments, addressing a critical gap in the literature and offering improved performance for real-world cloud systems.

Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling

TL;DR

The paper tackles DVAMP, a dynamic VM allocation problem in multi-NUMA PMs, by formulating offline and online MILP models and deriving a tight greedy-online performance bound expressed as with . It then proposes SPANE, a symmetry-preserving DRL framework built on a three-module architecture (Embedding Generation, Value, Advantage) to exploit permutation invariance across PMs, improving learning efficiency and solution quality. Extensive Huawei-East-1 trace experiments show SPANE reduces average VM wait time by 45% over baselines and demonstrates strong cross-PM-count generalization without retraining. These results advance NUMA-aware cloud resource management by providing rigorous theory and practical, scalable techniques for dynamic VM scheduling in real-world data centers.

Abstract

As cloud computing continues to evolve, the adoption of multi-NUMA (Non-Uniform Memory Access) architecture by cloud service providers has introduced new challenges in virtual machine (VM) scheduling. To address these challenges and more accurately reflect the complexities faced by modern cloud environments, we introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP). We formally define both offline and online versions of DVAMP as mixed-integer linear programming problems, providing a rigorous mathematical foundation for analysis. A tight performance bound for greedy online algorithms is derived, offering insights into the worst-case optimality gap as a function of the number of physical machines and VM lifetime variability. To address the challenges posed by DVAMP, we propose SPANE (Symmetry-Preserving Architecture for Multi-NUMA Environments), a novel deep reinforcement learning approach that exploits the problem's inherent symmetries. SPANE produces invariant results under arbitrary permutations of physical machine states, enhancing learning efficiency and solution quality. Extensive experiments conducted on the Huawei-East-1 dataset demonstrate that SPANE outperforms existing baselines, reducing average VM wait time by 45%. Our work contributes to the field of cloud resource management by providing both theoretical insights and practical solutions for VM scheduling in multi-NUMA environments, addressing a critical gap in the literature and offering improved performance for real-world cloud systems.

Paper Structure

This paper contains 21 sections, 20 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of Multi-NUMA VM scheduling. Each PM consists of two NUMA nodes, with VMs deployed in one of three scenarios: (1) allocated entirely to a single NUMA node (e.g., $v_j$ on NUMA 0), (2) evenly distributed across both NUMA nodes within the same PM (e.g., $v_{j+1}$), or (3) delayed due to insufficient resources (e.g., $v_{j+2}$).
  • Figure 2: VM scheduling illustration for a greedy online algorithm. The algorithm initially deploys all VMs arriving at $t=0$, leading to $m-1$ VMs waiting when new VMs arrive at $t=1$.
  • Figure 3: VM scheduling illustration for the optimal algorithm. By strategically placing VMs active after $t=1$ on the first NUMA node, the algorithm achieves immediate deployment of new VMs at $t=1$, minimizing wait time.
  • Figure 4: Illustration of a symmetry-preserving policy in DVAMP. The upper row depicts the original resource utilization of NUMA nodes in $p_1$ and $p_2$, with corresponding policy probabilities. The lower row shows the scenario after swapping the states of $p_1$ and $p_2$. A symmetry-preserving policy ensures that the output probabilities for $p_1$ and $p_2$ are correspondingly swapped, maintaining consistency under PM permutations.
  • Figure 5: Symmetry-preserving network architecture for DQN in the DVAMP problem.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1: VM Requests
  • Definition 2: Server Infrastructure
  • Definition 3: Offline DVAMP Problem
  • Remark 1
  • Definition 4: Online DVAMP Problem
  • Remark 2