Table of Contents
Fetching ...

Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive Control

Chanh Nguyen, Monowar Bhuyan, Erik Elmroth

TL;DR

This work tackles cold-start latency in serverless platforms by introducing a Model Predictive Control (MPC) based scheduler that proactively prewarms containers and shapes requests using Fourier-based invocation forecasts. The MPC optimizes across a prediction horizon to balance end-to-end latency and resource usage, incorporating penalties for cold starts, queueing, and overprovisioning while encouraging smooth provisioning. The approach is implemented as a middleware layer on Apache OpenWhisk running atop Kubernetes and evaluated with real Azure traces and synthetic workloads, showing up to 85% reductions in tail latency and 34% reductions in warm-container keep-alive costs compared to the OpenWhisk default policy. The results demonstrate that predictive shaping, combined with joint provisioning and dispatch decisions, yields significant latency improvements with modest control overhead, offering practical benefits for reactive and bursty serverless workloads.

Abstract

Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management. Its on-demand nature makes it especially appealing for latency-sensitive and bursty workloads. However, the cold start problem, i.e., where the platform incurs significant delay when provisioning new containers, remains the Achilles' heel of such platforms. This paper presents a predictive serverless scheduling framework based on Model Predictive Control to proactively mitigate cold starts, thereby improving end-to-end response time. By forecasting future invocations, the controller jointly optimizes container prewarming and request dispatching, improving latency while minimizing resource overhead. We implement our approach on Apache OpenWhisk, deployed on a Kubernetes-based testbed. Experimental results using real-world function traces and synthetic workloads demonstrate that our method significantly outperforms state-of-the-art baselines, achieving up to 85% lower tail latency and a 34% reduction in resource usage.

Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive Control

TL;DR

This work tackles cold-start latency in serverless platforms by introducing a Model Predictive Control (MPC) based scheduler that proactively prewarms containers and shapes requests using Fourier-based invocation forecasts. The MPC optimizes across a prediction horizon to balance end-to-end latency and resource usage, incorporating penalties for cold starts, queueing, and overprovisioning while encouraging smooth provisioning. The approach is implemented as a middleware layer on Apache OpenWhisk running atop Kubernetes and evaluated with real Azure traces and synthetic workloads, showing up to 85% reductions in tail latency and 34% reductions in warm-container keep-alive costs compared to the OpenWhisk default policy. The results demonstrate that predictive shaping, combined with joint provisioning and dispatch decisions, yields significant latency improvements with modest control overhead, offering practical benefits for reactive and bursty serverless workloads.

Abstract

Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management. Its on-demand nature makes it especially appealing for latency-sensitive and bursty workloads. However, the cold start problem, i.e., where the platform incurs significant delay when provisioning new containers, remains the Achilles' heel of such platforms. This paper presents a predictive serverless scheduling framework based on Model Predictive Control to proactively mitigate cold starts, thereby improving end-to-end response time. By forecasting future invocations, the controller jointly optimizes container prewarming and request dispatching, improving latency while minimizing resource overhead. We implement our approach on Apache OpenWhisk, deployed on a Kubernetes-based testbed. Experimental results using real-world function traces and synthetic workloads demonstrate that our method significantly outperforms state-of-the-art baselines, achieving up to 85% lower tail latency and a 34% reduction in resource usage.

Paper Structure

This paper contains 20 sections, 11 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: (a) Response time per request (in seconds). (b) Number of warm containers over time during 50 function invocations.
  • Figure 2: Unnecessary cold start due to lack of short-term request shaping.
  • Figure 3: MPC-based proactive serverless scheduling architecture.
  • Figure 4: Forecast error of Fourier and ARIMA models in two experiments: (a) with Microsoft Azure Functions and (b) with synthetic data.
  • Figure 5: Percentage improvement in total response time (average, 90th, and 95th percentiles) over OpenWhisk. (a) with Microsoft Azure Functions; (b) with synthetic data.
  • ...and 3 more figures