Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Zibo Wang; Pinghe Li; Chieh-Jan Mike Liang; Feng Wu; Francis Y. Yan

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, Francis Y. Yan

TL;DR

Autothrottle addresses the challenge of achieving end-to-end latency SLOs in microservice deployments by introducing a bi-level resource management framework that decouples application-level SLO feedback from per-service resource control. The Tower controller uses contextual bandits to set CPU throttle targets that Captains, lightweight per-service controllers, strive to realize by adjusting Linux CPU quotas in real time. This architecture accelerates response to workload changes while avoiding global dependency maintenance, achieving significant CPU savings and reduced SLO violations across multiple real-world workloads. The approach demonstrates practical deployability on Kubernetes with substantial resource efficiency gains, including up to $26.21\%$ CPU savings over baselines and a 21-day real-workload study showing notable reductions in SLO violations.

Abstract

Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: end-to-end application latency and per-service resource usage. Translating between the two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. We present Autothrottle, a bi-level resource management framework for microservices with latency SLOs (service-level objectives). It architecturally decouples application SLO feedback from service resource control, and bridges them through the notion of performance targets. Specifically, an application-wide learning-based controller is employed to periodically set performance targets -- expressed as CPU throttle ratios -- for per-service heuristic controllers to attain. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show superior CPU savings, up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

TL;DR

CPU savings over baselines and a 21-day real-workload study showing notable reductions in SLO violations.

Abstract

Paper Structure (33 sections, 12 figures, 4 tables, 2 algorithms)

This paper contains 33 sections, 12 figures, 4 tables, 2 algorithms.

Introduction
Background and Motivation
Implications of microservices
Service execution dependencies
Delayed end-to-end performance feedback
A practical approach
The Autothrottle Framework
Overview
Per-service controllers---Captains
Resource metrics and knobs
Multiplicative scale-up
Instantaneous scale-down
Rollback mechanism after scaling down
Application-level controller---Tower
Primer on contextual bandits
...and 18 more sections

Figures (12)

Figure 1: Individual microservices (bottom two panels) can exhibit vastly different resource usage patterns and short-term fluctuations. In addition, they do not necessarily have a strong correlation with the end-to-end application-level measurements (top two panels).
Figure 2: Autothrottle features bi-level resource management: The application-level learning-based controller (Tower), observing end-to-end latencies and workloads, periodically sets performance targets, expressed as CPU throttle ratios, for per-service heuristic controllers (Captains) to meet.
Figure 3: Our workload traces capture common patterns of RPS (requests per second) on an hourly basis. These patterns have been observed in real-world scenarios: Puffer streaming requests puffer, Google cluster usage google_cluster_trace, and Twitter tweets twitter_api. We also recorded a full 21-day workload trace from a global cloud provider for long-term evaluation. We scale these traces accordingly for each benchmark application to saturate the cluster (Appendix \ref{['sec:appendix_traces_rps']}).
Figure 4: Application latency vs. CPU allocations, as we vary the two baselines' CPU utilization threshold for Social-Network under the diurnal workload trace. Dashed red line illustrates the 200 ms SLO. Autothrottle is able to maintain the SLO with the minimum CPU allocation.
Figure 5: Autothrottle tailors CPU allocations to each microservice's resource usage. Figure shows top 15 microservices with the highest CPU usage in Train-Ticket under the diurnal workload trace.
...and 7 more figures

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

TL;DR

Abstract

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Authors

TL;DR

Abstract

Table of Contents

Figures (12)