Table of Contents
Fetching ...

Characterising resource management performance in Kubernetes

Víctor Medel, Rafael Tolosana-Calasanz, José Ángel Bañares, Unai Arronategui, Omer F. Rana

TL;DR

The paper addresses elastic cloud resource provisioning in Kubernetes and introduces a Petri Net-based framework (Reference Nets) to model pod and container lifecycles and the overheads of deployment and termination. By benchmarking on an eight-node cluster, it parameterizes the model with observed timings, defines metrics such as $T_d$, $T_{down}$, and $T_t$, and analyzes the overhead of the pod abstraction across CPU, IO, and network workloads. It contributes actionable rules for configuring the number of containers per pod ($\rho$) to optimize performance and supports capacity planning for elastic Kubernetes deployments. The work enables designers to reason about scheduling, resource sharing, and configuration choices to improve the responsiveness and efficiency of containerized cloud applications.

Abstract

A key challenge for supporting elastic behaviour in cloud systems is to achieve a good performance in automated (de-)provisioning and scheduling of computing resources. One of the key aspects that can be significant is the overheads associated with deploying, terminating and maintaining resources. Therefore, due to their lower start up and termination overhead, containers are rapidly replacing Virtual Machines (VMs) in many cloud deployments, as the computation instance of choice. In this paper, we analyse the performance of Kubernetes achieved through a Petri net-based performance model. Kubernetes is a container management system for a distributed cluster environment. Our model can be characterised using data from a Kubernetes deployment, and can be exploited for supporting capacity planning and designing Kubernetes-based elastic applications.

Characterising resource management performance in Kubernetes

TL;DR

The paper addresses elastic cloud resource provisioning in Kubernetes and introduces a Petri Net-based framework (Reference Nets) to model pod and container lifecycles and the overheads of deployment and termination. By benchmarking on an eight-node cluster, it parameterizes the model with observed timings, defines metrics such as , , and , and analyzes the overhead of the pod abstraction across CPU, IO, and network workloads. It contributes actionable rules for configuring the number of containers per pod () to optimize performance and supports capacity planning for elastic Kubernetes deployments. The work enables designers to reason about scheduling, resource sharing, and configuration choices to improve the responsiveness and efficiency of containerized cloud applications.

Abstract

A key challenge for supporting elastic behaviour in cloud systems is to achieve a good performance in automated (de-)provisioning and scheduling of computing resources. One of the key aspects that can be significant is the overheads associated with deploying, terminating and maintaining resources. Therefore, due to their lower start up and termination overhead, containers are rapidly replacing Virtual Machines (VMs) in many cloud deployments, as the computation instance of choice. In this paper, we analyse the performance of Kubernetes achieved through a Petri net-based performance model. Kubernetes is a container management system for a distributed cluster environment. Our model can be characterised using data from a Kubernetes deployment, and can be exploited for supporting capacity planning and designing Kubernetes-based elastic applications.
Paper Structure (13 sections, 2 equations, 7 figures, 8 tables)

This paper contains 13 sections, 2 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Model of the life cycle of pods in Kubernetes.
  • Figure 2: Model of the life cycle of containers inside a pod. $r$ models the restart policy of a container -- Always = 0, OnFailure=1, Never = 2.
  • Figure 3: Total deployment time ($T_d$) vs. Number of deployed containers ($C$). Each graph shows: mean time, confidence interval for the mean for a varying number of machines in cluster, $n$. The results are grouped by the number of containers inside a pod, $\frac{1}{\rho}$
  • Figure 4: Time to create a single container ($T_c$ function) vs. Number of deployed containers ($C$). Each graph shows: mean time, confidence interval for the mean for a varying number of machines in cluster, $n$. The results are grouped by the number of containers inside a pod, $\frac{1}{\rho}$
  • Figure 5: $T_t$ vs. $C$. Each graph shows: mean time, confidence interval for the mean for a varying number of machines in cluster, $n$. The results are grouped by the number of containers inside a pod, $\frac{1}{\rho}$. The container image (1.225 GB) is not present in the machines.
  • ...and 2 more figures