Table of Contents
Fetching ...

Distributed Tracing for Cascading Changes of Objects in the Kubernetes Control Plane

Tomoyuki Ehira, Daisuke Kotani, Yasuo Okabe

TL;DR

This paper proposes a system that automatically traces changes to objects in the control plane of Kubernetes by adding one identifier, a Change Propagation ID (CPID), to the metadata of each object, and the controller that observes an object change propagates CPID in the object to the objects that the controller updates.

Abstract

Kubernetes is a container orchestration system that employs a declarative configuration management approach. In Kubernetes, each desired and actual state is represented by an ``object'', and multiple controllers autonomously monitor related objects and update their objects towards the desired state in the control plane. Because of this design, changes to one object propagate to other objects in a chain. The cluster operators need to know the time required for these cascading changes to complete, as it directly affects the quality of service of applications running on the cluster. However, there is no practical way to observe this kind of cascading change, including breakdown of the time taken by each change. Distributed tracing techniques are commonly used in the microservices architecture to monitor application performance, but they are not directly applicable to the control plane of Kubernetes; the microservices architecture relies on explicitly calling APIs on other services, but in Kubernetes the controllers just monitor objects to know when to start processing, and never call functions on other controllers directly. In this paper, we propose a system that automatically traces changes to objects in the control plane. Our method adds one identifier, a Change Propagation ID (CPID), to the metadata of an object, and the controller that observes an object change propagates its CPID to the objects that the controller is updated. When multiple changes need to be merged on an object, a new CPID is generated, and the relationship between the original CPID and the new CPID is sent to the external trace server. We confirmed that change propagation can be visualized and the required time measured. We also showed that this system's overhead is not significant.

Distributed Tracing for Cascading Changes of Objects in the Kubernetes Control Plane

TL;DR

This paper proposes a system that automatically traces changes to objects in the control plane of Kubernetes by adding one identifier, a Change Propagation ID (CPID), to the metadata of each object, and the controller that observes an object change propagates CPID in the object to the objects that the controller updates.

Abstract

Kubernetes is a container orchestration system that employs a declarative configuration management approach. In Kubernetes, each desired and actual state is represented by an ``object'', and multiple controllers autonomously monitor related objects and update their objects towards the desired state in the control plane. Because of this design, changes to one object propagate to other objects in a chain. The cluster operators need to know the time required for these cascading changes to complete, as it directly affects the quality of service of applications running on the cluster. However, there is no practical way to observe this kind of cascading change, including breakdown of the time taken by each change. Distributed tracing techniques are commonly used in the microservices architecture to monitor application performance, but they are not directly applicable to the control plane of Kubernetes; the microservices architecture relies on explicitly calling APIs on other services, but in Kubernetes the controllers just monitor objects to know when to start processing, and never call functions on other controllers directly. In this paper, we propose a system that automatically traces changes to objects in the control plane. Our method adds one identifier, a Change Propagation ID (CPID), to the metadata of an object, and the controller that observes an object change propagates its CPID to the objects that the controller is updated. When multiple changes need to be merged on an object, a new CPID is generated, and the relationship between the original CPID and the new CPID is sent to the external trace server. We confirmed that change propagation can be visualized and the required time measured. We also showed that this system's overhead is not significant.

Paper Structure

This paper contains 36 sections, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Simplified Kubernetes architecture
  • Figure 2: Example of cascading changes: Starting with the creation of the Deployment object, the Deployment controller, ReplicaSet controller, Scheduler, and kubelet observe and process the object changes, and finally the container is created.
  • Figure 3: Object with CPID. The trace context is written in the annotation field.
  • Figure 4: How the controllers work in the proposed method. The CPID is put on the object, and the controller sends a mergelog to the trace server at change merging, updating it with the new CPID. The controller outputs the log and span along with the CPID that caused the process.
  • Figure 5: Proposed system propagating CPIDs and mergelog and merging graphs at each moment
  • ...and 7 more figures