Table of Contents
Fetching ...

CPU-Limits kill Performance: Time to rethink Resource Control

Chirag Shetty, Sarthak Chakraborty, Hubertus Franke, Larisa Shwartz, Chandra Narayanaswami, Indranil Gupta, Saurabh Jha

TL;DR

This paper challenges the conventional reliance on CPU-Limits ($c.lim$) for latency-sensitive cloud workloads, presenting empirical evidence that throttling degrades tail latency and can inflate costs. It argues that CPU-Requests ($c.req$) suffice to guarantee CPU share under CFS fairness, and that the standard limit-based control paradigm hurts performance and reliability. The authors propose a limit-free design with redesigned autoscalers and a performance-based billing model, exemplified by the Yet Another Autoscaler (YAAS) prototype, which achieves substantial resource savings and more predictable performance. They also provide a pragmatic view on when $c.lim$ might still be useful (e.g., background jobs) and offer a roadmap for practical deployment and further research into limit-free resource control.

Abstract

Research in compute resource management for cloud-native applications is dominated by the problem of setting optimal CPU limits -- a fundamental OS mechanism that strictly restricts a container's CPU usage to its specified CPU-limits . Rightsizing and autoscaling works have innovated on allocation/scaling policies assuming the ubiquity and necessity of CPU-limits . We question this. Practical experiences of cloud users indicate that CPU-limits harms application performance and costs more than it helps. These observations are in contradiction to the conventional wisdom presented in both academic research and industry best practices. We argue that this indiscriminate adoption of CPU-limits is driven by erroneous beliefs that CPU-limits is essential for operational and safety purposes. We provide empirical evidence making a case for eschewing CPU-limits completely from latency-sensitive applications. This prompts a fundamental rethinking of auto-scaling and billing paradigms and opens new research avenues. Finally, we highlight specific scenarios where CPU-limits can be beneficial if used in a well-reasoned way (e.g. background jobs).

CPU-Limits kill Performance: Time to rethink Resource Control

TL;DR

This paper challenges the conventional reliance on CPU-Limits () for latency-sensitive cloud workloads, presenting empirical evidence that throttling degrades tail latency and can inflate costs. It argues that CPU-Requests () suffice to guarantee CPU share under CFS fairness, and that the standard limit-based control paradigm hurts performance and reliability. The authors propose a limit-free design with redesigned autoscalers and a performance-based billing model, exemplified by the Yet Another Autoscaler (YAAS) prototype, which achieves substantial resource savings and more predictable performance. They also provide a pragmatic view on when might still be useful (e.g., background jobs) and offer a roadmap for practical deployment and further research into limit-free resource control.

Abstract

Research in compute resource management for cloud-native applications is dominated by the problem of setting optimal CPU limits -- a fundamental OS mechanism that strictly restricts a container's CPU usage to its specified CPU-limits . Rightsizing and autoscaling works have innovated on allocation/scaling policies assuming the ubiquity and necessity of CPU-limits . We question this. Practical experiences of cloud users indicate that CPU-limits harms application performance and costs more than it helps. These observations are in contradiction to the conventional wisdom presented in both academic research and industry best practices. We argue that this indiscriminate adoption of CPU-limits is driven by erroneous beliefs that CPU-limits is essential for operational and safety purposes. We provide empirical evidence making a case for eschewing CPU-limits completely from latency-sensitive applications. This prompts a fundamental rethinking of auto-scaling and billing paradigms and opens new research avenues. Finally, we highlight specific scenarios where CPU-limits can be beneficial if used in a well-reasoned way (e.g. background jobs).

Paper Structure

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: c.limits are either harmful or unnecessary: Summary of §\ref{['sec:motivation']}.
  • Figure 2: (a) Throttling of single-threaded and multi-threaded processes. (b) Formation of queues: Pods have c.requests = 300m millicore (m), i.e., 30ms per 100ms. In Row-4, c.limit of 300 millicores is applied on both. We assume fair scheduling (with no preemption, for the sake of simplicity). (c) Impact of c.limits on cost: Percentage increase in CPU required to meet SLO with c.limits specified using c.limits vs just c.request (SN app)
  • Figure 3: (a) Impact on latency with c.limits of 1× & 1.1× the CPU util (HR). (b) [top] Queuing with and without c.limits. X axis is % utilization of pod's allocated c.req/c.lim. [bottom] c.req protects app A against bursting app B (c) No. of scaling actions & CPU required to meet SLO on increasing load by 25% different scaling thresholds (60%, 70%, 90%) with & without c.limits (SN) (HR = HotelReservation, SN = SocialNetwork dtsb).
  • Figure 4: Savings on removing c.lim & with YAAS (HR app)
  • Figure 5: YAAS's scaling policies.