Table of Contents
Fetching ...

Analytically-Driven Resource Management for Cloud-Native Microservices

Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou

TL;DR

This work addresses resource management for cloud-native microservices under multi-class/priority SLAs, proposing Ursa, an analytical framework that decomposes end-to-end latency into per-service components and maps them to per-microservice resource thresholds via a mixed-integer program. By identifying backpressure-free operating zones, Ursa treats microservices independently, dramatically reducing modeling complexity and enabling fast, threshold-based scaling. An exploration procedure profiled per-service latency versus load, while an optimization engine derived from the MIP ensures SLA satisfaction with minimal resource use; the system is implemented on Kubernetes with a Dapr-based microservice stack and validated on DeathStarBench-like benchmarks with RPCs and MQs. Results show Ursa dramatically cuts data collection and control-plane time (over ML-based approaches), lowers SLA violations, and reduces CPU allocations, while adapting to changes in service logic and request mix. Overall, Ursa provides a practical, scalable alternative to ML-driven resource management in complex microservice topologies with diverse SLAs and communication patterns.

Abstract

Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.

Analytically-Driven Resource Management for Cloud-Native Microservices

TL;DR

This work addresses resource management for cloud-native microservices under multi-class/priority SLAs, proposing Ursa, an analytical framework that decomposes end-to-end latency into per-service components and maps them to per-microservice resource thresholds via a mixed-integer program. By identifying backpressure-free operating zones, Ursa treats microservices independently, dramatically reducing modeling complexity and enabling fast, threshold-based scaling. An exploration procedure profiled per-service latency versus load, while an optimization engine derived from the MIP ensures SLA satisfaction with minimal resource use; the system is implemented on Kubernetes with a Dapr-based microservice stack and validated on DeathStarBench-like benchmarks with RPCs and MQs. Results show Ursa dramatically cuts data collection and control-plane time (over ML-based approaches), lowers SLA violations, and reduces CPU allocations, while adapting to changes in service logic and request mix. Overall, Ursa provides a practical, scalable alternative to ML-driven resource management in complex microservice topologies with diverse SLAs and communication patterns.

Abstract

Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.
Paper Structure (16 sections, 3 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 3 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: Inter-service communication methods.
  • Figure 2: Backpressure effects in a service chain.
  • Figure 3: Backpressure profiling engine architecture.
  • Figure 4: Identifying backpressure-free CPU thresholds in a service mesh. We incrementally decrease the amount of resources allocated to the tested microservice, until we observe an increase in the latency of the proxy. Given the proxy's lack of computation activity, this increase signals the presence of backpressure. We use this threshold as the utilization the tested service should not exceed to avoid introducing backpressure to its parent tiers.
  • Figure 5: System architecture of Ursa.
  • ...and 9 more figures