Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, Francis Y. Yan
TL;DR
Autothrottle addresses the challenge of achieving end-to-end latency SLOs in microservice deployments by introducing a bi-level resource management framework that decouples application-level SLO feedback from per-service resource control. The Tower controller uses contextual bandits to set CPU throttle targets that Captains, lightweight per-service controllers, strive to realize by adjusting Linux CPU quotas in real time. This architecture accelerates response to workload changes while avoiding global dependency maintenance, achieving significant CPU savings and reduced SLO violations across multiple real-world workloads. The approach demonstrates practical deployability on Kubernetes with substantial resource efficiency gains, including up to $26.21\%$ CPU savings over baselines and a 21-day real-workload study showing notable reductions in SLO violations.
Abstract
Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: end-to-end application latency and per-service resource usage. Translating between the two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. We present Autothrottle, a bi-level resource management framework for microservices with latency SLOs (service-level objectives). It architecturally decouples application SLO feedback from service resource control, and bridges them through the notion of performance targets. Specifically, an application-wide learning-based controller is employed to periodically set performance targets -- expressed as CPU throttle ratios -- for per-service heuristic controllers to attain. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show superior CPU savings, up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.
