Table of Contents
Fetching ...

Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

TL;DR

Humas addresses auto-scaling for large-scale microservices by explicitly modeling hardware heterogeneity and upgrade-induced pattern drift. It introduces three components—Resource Usage Normalizer, LSDD-based Pattern Drift Detector, and Capacity Adjuster—coupled with KAE-Informer workload forecasting and GRF-based pattern modeling to produce accurate capacity plans. Empirical evaluation on 50 microservices and over 11,000 containers shows substantial gains in resource efficiency (~30.4%) and performance stability (~48.0%), along with reduced QoS violations. The framework offers practical benefits for production data centers by maintaining QoS under dynamic workloads and heterogeneous hardware, even as services upgrade frequently.

Abstract

An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two significant challenges in characterizing performance patterns for large-scale microservices. Firstly, diverse microservices demonstrate varying sensitivities to heterogeneous machines, causing difficulty in quantifying the performance difference in a fixed manner. Secondly, frequent version upgrades of microservices result in uncertain changes in performance patterns, known as pattern drifts, leading to imprecise resource capacity estimation issues. To address these challenges, we propose Humas, a heterogeneity- and upgrade-aware auto-scaling framework for large-scale microservices. Firstly, Humas quantifies the difference in resource efficiency among heterogeneous machines for various microservices online and normalizes their resources in standard units. Additionally, Humas develops a least squares density-difference (LSDD) based algorithm to identify pattern drifts caused by upgrades. Lastly, Humas generates capacity adjustment plans for microservices based on the latest performance patterns and predicted workloads. The experiment results conducted on 50 real microservices with over 11,000 containers demonstrate that Humas improves resource efficiency and performance stability by approximately 30.4% and 48.0%, respectively, compared to state-of-the-art approaches.

Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

TL;DR

Humas addresses auto-scaling for large-scale microservices by explicitly modeling hardware heterogeneity and upgrade-induced pattern drift. It introduces three components—Resource Usage Normalizer, LSDD-based Pattern Drift Detector, and Capacity Adjuster—coupled with KAE-Informer workload forecasting and GRF-based pattern modeling to produce accurate capacity plans. Empirical evaluation on 50 microservices and over 11,000 containers shows substantial gains in resource efficiency (~30.4%) and performance stability (~48.0%), along with reduced QoS violations. The framework offers practical benefits for production data centers by maintaining QoS under dynamic workloads and heterogeneous hardware, even as services upgrade frequently.

Abstract

An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two significant challenges in characterizing performance patterns for large-scale microservices. Firstly, diverse microservices demonstrate varying sensitivities to heterogeneous machines, causing difficulty in quantifying the performance difference in a fixed manner. Secondly, frequent version upgrades of microservices result in uncertain changes in performance patterns, known as pattern drifts, leading to imprecise resource capacity estimation issues. To address these challenges, we propose Humas, a heterogeneity- and upgrade-aware auto-scaling framework for large-scale microservices. Firstly, Humas quantifies the difference in resource efficiency among heterogeneous machines for various microservices online and normalizes their resources in standard units. Additionally, Humas develops a least squares density-difference (LSDD) based algorithm to identify pattern drifts caused by upgrades. Lastly, Humas generates capacity adjustment plans for microservices based on the latest performance patterns and predicted workloads. The experiment results conducted on 50 real microservices with over 11,000 containers demonstrate that Humas improves resource efficiency and performance stability by approximately 30.4% and 48.0%, respectively, compared to state-of-the-art approaches.
Paper Structure (42 sections, 14 equations, 11 figures, 11 tables, 2 algorithms)

This paper contains 42 sections, 14 equations, 11 figures, 11 tables, 2 algorithms.

Figures (11)

  • Figure 1: CPU usage and utilization of $MS_0$ with dynamic workload
  • Figure 2: Importance analysis of hardware configurations
  • Figure 3: CDF of work efficiency ratio between $816X$ and $826X$
  • Figure 4: Average container CPU usage of four microservices during peak load
  • Figure 5: The unpredictability of version upgrades
  • ...and 6 more figures