Table of Contents
Fetching ...

MAS-H2: A Hierarchical Multi-Agent System for Holistic Cloud-Native Autoscaling

Hamed Hamzeh, Parisa Vahdatian

TL;DR

The MAS-H2 system demonstrated proactive planning in a volatile Chaotic Flash Sale scenario by filtering transient noise and deploying more replicas compared to HPA, and seamlessly performed a zero-downtime strategic migration between two cost- and performance-optimised infrastructures.

Abstract

Autoscaling in cloud-native platforms like Kubernetes is reactive and metric-driven, leading to a strategic void problem. This comes from the decoupling of higher-level business policies from lower-level resource provisioning. The strategic void, coupled with a fragmented coordination of pod and node scaling, can lead to significant resource waste and performance degradation under dynamic workloads. In this paper, we present MAS-H2, a new hierarchical multi-agent system that addresses the challenges of autonomic cloud resource management with a complete end-to-end solution. MAS-H2 systematically decomposes the control problem into three layers: a Strategic Agent that formalises business policies (e.g., cost vs. performance) into a global utility function; Planning Agents that produce a joint, proactive scaling plan for pods and nodes with time-series forecasting; and Execution Agents that execute the scaling plan. We built and tested a MAS-H2 prototype as a Kubernetes Operator on Google Kubernetes Engine (GKE) to benchmark it against the native Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA) baselines under two realistic, spiky, and stress-inducing workload scenarios. The results show that the MAS-H2 system maintained application CPU usage under 40% for predictable Heartbeat workloads. This resulted in over 50% less sustained CPU stress than the native HPA baseline, which typically operated above 80%. The MAS-H2 system demonstrated proactive planning in a volatile Chaotic Flash Sale scenario by filtering transient noise and deploying more replicas compared to HPA. It reduced peak CPU load by 55% without under-provisioning. Beyond performance, MAS-H2 seamlessly performed a zero-downtime strategic migration between two cost- and performance-optimised infrastructures.

MAS-H2: A Hierarchical Multi-Agent System for Holistic Cloud-Native Autoscaling

TL;DR

The MAS-H2 system demonstrated proactive planning in a volatile Chaotic Flash Sale scenario by filtering transient noise and deploying more replicas compared to HPA, and seamlessly performed a zero-downtime strategic migration between two cost- and performance-optimised infrastructures.

Abstract

Autoscaling in cloud-native platforms like Kubernetes is reactive and metric-driven, leading to a strategic void problem. This comes from the decoupling of higher-level business policies from lower-level resource provisioning. The strategic void, coupled with a fragmented coordination of pod and node scaling, can lead to significant resource waste and performance degradation under dynamic workloads. In this paper, we present MAS-H2, a new hierarchical multi-agent system that addresses the challenges of autonomic cloud resource management with a complete end-to-end solution. MAS-H2 systematically decomposes the control problem into three layers: a Strategic Agent that formalises business policies (e.g., cost vs. performance) into a global utility function; Planning Agents that produce a joint, proactive scaling plan for pods and nodes with time-series forecasting; and Execution Agents that execute the scaling plan. We built and tested a MAS-H2 prototype as a Kubernetes Operator on Google Kubernetes Engine (GKE) to benchmark it against the native Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA) baselines under two realistic, spiky, and stress-inducing workload scenarios. The results show that the MAS-H2 system maintained application CPU usage under 40% for predictable Heartbeat workloads. This resulted in over 50% less sustained CPU stress than the native HPA baseline, which typically operated above 80%. The MAS-H2 system demonstrated proactive planning in a volatile Chaotic Flash Sale scenario by filtering transient noise and deploying more replicas compared to HPA. It reduced peak CPU load by 55% without under-provisioning. Beyond performance, MAS-H2 seamlessly performed a zero-downtime strategic migration between two cost- and performance-optimised infrastructures.
Paper Structure (31 sections, 4 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 31 sections, 4 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: High-level diagram of the MAS-H² architecture.
  • Figure 2: Prototype implementation pipeline
  • Figure 3: Comprehensive performance dashboard comparing the MAS-H² agent (orange) against the HPA Baseline (blue), conducting Heartbeat scenario. The plots detail pod and infrastructure scaling, application performance, resource efficiency, and cost metrics over the duration of the experiment.
  • Figure 4: Comprehensive performance dashboard comparing the MAS-H² agent (orange) against the HPA Baseline (blue), conducting Flash Sale scenario. The plots detail pod and infrastructure scaling, application performance, resource efficiency, and cost metrics over the duration of the experiment.
  • Figure 5: Pairwise relationship of key metrics in Chaotic Flash Sale scenario. The diagonal shows the distribution for each metric, while the off-diagonal scatter plots reveal correlations. The MAS-H² agent (orange) explores a much wider operational space and navigates efficiency trade-offs more effectively than the static HPA baseline (blue).
  • ...and 1 more figures