Table of Contents
Fetching ...

Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making

Yang Luo, Shiyu Wang, Zhemeng Yu, Wei Lu, Xiaofeng Gao, Lintao Ma, Guihai Chen

TL;DR

Cloud data centers face inefficiencies due to hierarchical indicator structures, non-Gaussian workloads, and uncertain scaling decisions. The paper presents HARMONY, a unified framework that jointly performs hierarchical multi-indicator distribution forecasting using a temporal-hierarchical encoder and a Real-NVP-based normalizing flow, followed by a Bayesian decision-maker that leverages full predictive distributions to optimize resource allocation under SLA constraints. The approach delivers state-of-the-art predictive accuracy across four large-scale cloud datasets and achieves substantial real-world impact, including significant GPU-hour savings and cost reductions in a month-long deployment. This work offers a scalable, uncertainty-aware solution for adaptive resource scaling in large-scale cloud environments.

Abstract

The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hierarchical indicator structures, modeling non-Gaussian distributions, and decision-making under uncertainty. To address these challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based Resource Modeling and Decision-Making System. HARMONY combines hierarchical multi-indicator distribution forecasting and uncertainty-aware Bayesian decision-making. It introduces a novel hierarchical attention mechanism that comprehensively models complex inter-indicator dependencies, enabling accurate predictions that can adapt to evolving environment states. By transforming Gaussian projections into adaptive non-Gaussian distributions via Normalizing Flows. Crucially, HARMONY leverages the full predictive distributions in an adaptive Bayesian process, proactively incorporating uncertainties to optimize resource allocation while robustly meeting SLA constraints under varying conditions. Extensive evaluations across four large-scale cloud datasets demonstrate HARMONY's state-of-the-art performance, significantly outperforming nine established methods. A month-long real-world deployment validated HARMONY's substantial practical impact, realizing over 35,000 GPU hours in savings and translating to $100K+ in cost reduction, showcasing its remarkable economic value through adaptive, uncertainty-aware scaling. Our code is available at https://github.com/Floating-LY/HARMONY1.

Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making

TL;DR

Cloud data centers face inefficiencies due to hierarchical indicator structures, non-Gaussian workloads, and uncertain scaling decisions. The paper presents HARMONY, a unified framework that jointly performs hierarchical multi-indicator distribution forecasting using a temporal-hierarchical encoder and a Real-NVP-based normalizing flow, followed by a Bayesian decision-maker that leverages full predictive distributions to optimize resource allocation under SLA constraints. The approach delivers state-of-the-art predictive accuracy across four large-scale cloud datasets and achieves substantial real-world impact, including significant GPU-hour savings and cost reductions in a month-long deployment. This work offers a scalable, uncertainty-aware solution for adaptive resource scaling in large-scale cloud environments.

Abstract

The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hierarchical indicator structures, modeling non-Gaussian distributions, and decision-making under uncertainty. To address these challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based Resource Modeling and Decision-Making System. HARMONY combines hierarchical multi-indicator distribution forecasting and uncertainty-aware Bayesian decision-making. It introduces a novel hierarchical attention mechanism that comprehensively models complex inter-indicator dependencies, enabling accurate predictions that can adapt to evolving environment states. By transforming Gaussian projections into adaptive non-Gaussian distributions via Normalizing Flows. Crucially, HARMONY leverages the full predictive distributions in an adaptive Bayesian process, proactively incorporating uncertainties to optimize resource allocation while robustly meeting SLA constraints under varying conditions. Extensive evaluations across four large-scale cloud datasets demonstrate HARMONY's state-of-the-art performance, significantly outperforming nine established methods. A month-long real-world deployment validated HARMONY's substantial practical impact, realizing over 35,000 GPU hours in savings and translating to $100K+ in cost reduction, showcasing its remarkable economic value through adaptive, uncertainty-aware scaling. Our code is available at https://github.com/Floating-LY/HARMONY1.
Paper Structure (27 sections, 10 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 27 sections, 10 equations, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: Hierarchical Architecture for Cloud Services. The figure illustrates the hierarchical relationship from service requests, resource consumption (CPU, GPU, Memory) to service quality (Resource Utilization, Response Time). Service provider monitor the service quality to promptly adjust resource consumption for cloud services.
  • Figure 2: Forecasting Framework. Multiple indicators of cloud services are initially input for indicator embedding, followed by hierarchical attention to capture structural relationships and projection into Gaussian distributions. Finally, the Gaussian distribution is transformed by Normalizing Flows to obtain the distribution that conforms to the indicators.
  • Figure 3: Bayesian Decision-Making Algorithm Design. The figure demonstrates the exploration of $N \in [N_{low}, N_{high}]$ to find the optimal value that minimizes the total cost.
  • Figure 4: Ablation Study. Comparison of Prediction Accuracy between HARMONY and its variant on MSE and CRPS respectively.
  • Figure 5: Comparison of Decision Results. For a same GPU service, left and right subfigures show allocated CPU and GPU resources respectively from different methods over time stamps (X-axis). Z-axis represents allocated resources. "Consumption" denotes actual service resource usage.

Theorems & Definitions (5)

  • Definition 1: Historical Indicators
  • Definition 2: Level Information
  • Definition 3: Forecasting
  • Definition 4: Computational Unit
  • Definition 5: Decision