Table of Contents
Fetching ...

Non-Stationary Gradient Descent for Optimal Auto-Scaling in Serverless Platforms

Jonatha Anselmi, Bruno Gaujal, Louis-Sebastien Rebuffi

TL;DR

The paper tackles autoscaling in serverless platforms under a scale-per-request regime by learning the optimal surplus number of servers $\theta^*$ to spawn when a request finds no idle servers. It models the system as a parameterized Markov chain and solves online via a non-stationary Kiefer–Wolfowitz algorithm that uses near-stationary samples obtained through windowed transient observations tied to the chain's mixing time. The key contributions are a generalized CTMC with $\theta$, a convergence-guaranteed $O(n^{-2/3})$ non-stationary KW scheme, a smooth, truncated scale-up rule ensuring differentiability of the invariant distribution, and numerical evidence showing energy-performance gains of roughly 5–8% over the baseline. The work provides a principled, provably convergent online learning method for autoscaling in serverless systems, addressing the challenges of non-stationary sampling in Markovian dynamics and yielding practical improvements in latency and energy usage. $\theta^*$ denotes the optimal surplus of servers, and the convergence rate scales with the underlying mixing time through the parameter $\rho$ in the Assumptions.$

Abstract

To efficiently manage serverless computing platforms, a key aspect is the auto-scaling of services, i.e., the set of computational resources allocated to a service adapts over time as a function of the traffic demand. The objective is to find a compromise between user-perceived performance and energy consumption. In this paper, we consider the \emph{scale-per-request} auto-scaling pattern and investigate how many function instances (or servers) should be spawned each time an \emph{unfortunate} job arrives, i.e., a job that finds all servers busy upon its arrival. We address this problem by following a stochastic optimization approach: we develop a stochastic gradient descent scheme of the Kiefer--Wolfowitz type that applies \emph{over a single run of the state evolution}. At each iteration, the proposed scheme computes an estimate of the number of servers to spawn each time an unfortunate job arrives to minimize some cost function. Under natural assumptions, we show that the sequence of estimates produced by our scheme is asymptotically optimal almost surely. In addition, we prove that its convergence rate is $O(n^{-2/3})$ where $n$ is the number of iterations. From a mathematical point of view, the stochastic optimization framework induced by auto-scaling exhibits non-standard aspects that we approach from a general point of view. We consider the setting where a controller can only get samples of the \emph{transient} -- rather than stationary -- behavior of the underlying stochastic system. To handle this difficulty, we develop arguments that exploit properties of the mixing time of the underlying Markov chain. By means of numerical simulations, we validate the proposed approach and quantify its gain with respect to common existing scale-up rules.

Non-Stationary Gradient Descent for Optimal Auto-Scaling in Serverless Platforms

TL;DR

The paper tackles autoscaling in serverless platforms under a scale-per-request regime by learning the optimal surplus number of servers to spawn when a request finds no idle servers. It models the system as a parameterized Markov chain and solves online via a non-stationary Kiefer–Wolfowitz algorithm that uses near-stationary samples obtained through windowed transient observations tied to the chain's mixing time. The key contributions are a generalized CTMC with , a convergence-guaranteed non-stationary KW scheme, a smooth, truncated scale-up rule ensuring differentiability of the invariant distribution, and numerical evidence showing energy-performance gains of roughly 5–8% over the baseline. The work provides a principled, provably convergent online learning method for autoscaling in serverless systems, addressing the challenges of non-stationary sampling in Markovian dynamics and yielding practical improvements in latency and energy usage. denotes the optimal surplus of servers, and the convergence rate scales with the underlying mixing time through the parameter in the Assumptions.$

Abstract

To efficiently manage serverless computing platforms, a key aspect is the auto-scaling of services, i.e., the set of computational resources allocated to a service adapts over time as a function of the traffic demand. The objective is to find a compromise between user-perceived performance and energy consumption. In this paper, we consider the \emph{scale-per-request} auto-scaling pattern and investigate how many function instances (or servers) should be spawned each time an \emph{unfortunate} job arrives, i.e., a job that finds all servers busy upon its arrival. We address this problem by following a stochastic optimization approach: we develop a stochastic gradient descent scheme of the Kiefer--Wolfowitz type that applies \emph{over a single run of the state evolution}. At each iteration, the proposed scheme computes an estimate of the number of servers to spawn each time an unfortunate job arrives to minimize some cost function. Under natural assumptions, we show that the sequence of estimates produced by our scheme is asymptotically optimal almost surely. In addition, we prove that its convergence rate is where is the number of iterations. From a mathematical point of view, the stochastic optimization framework induced by auto-scaling exhibits non-standard aspects that we approach from a general point of view. We consider the setting where a controller can only get samples of the \emph{transient} -- rather than stationary -- behavior of the underlying stochastic system. To handle this difficulty, we develop arguments that exploit properties of the mixing time of the underlying Markov chain. By means of numerical simulations, we validate the proposed approach and quantify its gain with respect to common existing scale-up rules.

Paper Structure

This paper contains 23 sections, 7 theorems, 45 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1

Let $(\theta_n)_n$ be the sequence of random variables generated by Algorithm algo:kw with parametrization eq:parameter1-eq:parameter2. Under Assumptions as:reg, as:mixing, and as:uniqueness, we have

Figures (6)

  • Figure 1: State transitions for each server.
  • Figure 2: The smoothed parameter function $\theta_{\varepsilon,M}$ compared with $\theta$
  • Figure 3: Plots of the cost function $f(\theta)$ (left) and of the sequences produced by Algorithm \ref{['algo:kw']} (right). The red line corresponds to the truncation to the interval $[0,M]$ over which the function is convex.
  • Figure 4: Scenario 1: Plots of the sequences produced by Algorithm \ref{['algo:kw']} when the underlying Markov chain is simulated for $\tau_n=$100 (left) and $\tau_n=$1000 (right) steps. In simulation time, the $x$-axis corresponds exactly to the one in Figure \ref{['fig:numerical']}.
  • Figure 5: A zoom of the plots of Figure \ref{['fig:comparison1']} obtained by truncation of the $x$-axis.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Definition 1: Cline68
  • Lemma 1: Theorem 4.3, Golub73
  • Proposition 1
  • proof
  • Remark 3
  • Remark 4
  • ...and 6 more