Table of Contents
Fetching ...

Employing Software Diversity in Cloud Microservices to Engineer Reliable and Performant Systems

Nazanin Akhtarian, Hamzeh Khazaei, Marin Litoiu

TL;DR

The paper tackles the challenge of maintaining both reliability and performance in evolving cloud microservice deployments by introducing a reliability engine that leverages software diversity. It formalizes a reliability score based on monitored metrics, allocates replicas accordingly, and integrates a diversity factor to preserve version variation through a diversity-aware autoscaling framework. The approach is implemented in a Kubernetes setting with a two-component system (Load Balancer and Scaling Engine) and validated via Chaos Mesh experiments on the Online Boutique application, demonstrating improvements in reliability and performance across varying workloads. The work offers practical mechanisms for running multi-version containers with adaptive load balancing and autoscaling, contributing to more resilient and scalable microservice architectures in practice.

Abstract

In the ever-shifting landscape of software engineering, we recognize the need for adaptation and evolution to maintain system dependability. As each software iteration potentially introduces new challenges, from unforeseen bugs to performance anomalies, it becomes paramount to understand and address these intricacies to ensure robust system operations during the lifetime. This work proposes employing software diversity to enhance system reliability and performance simultaneously. A cornerstone of our work is the derivation of a reliability metric. This metric encapsulates the reliability and performance of each software version under adverse conditions. Using the calculated reliability score, we implemented a dynamic controller responsible for adjusting the population of each software version. The goal is to maintain a higher replica count for more reliable versions while preserving the diversity of versions as much as possible. This balance is crucial for ensuring not only the reliability but also the performance of the system against a spectrum of potential failures. In addition, we designed and implemented a diversity-aware autoscaling algorithm that maintains the reliability and performance of the system at the same time and at any scale. Our extensive experiments on realistic cloud microservice-based applications show the effectiveness of the proposed approach in this paper in promoting both reliability and performance.

Employing Software Diversity in Cloud Microservices to Engineer Reliable and Performant Systems

TL;DR

The paper tackles the challenge of maintaining both reliability and performance in evolving cloud microservice deployments by introducing a reliability engine that leverages software diversity. It formalizes a reliability score based on monitored metrics, allocates replicas accordingly, and integrates a diversity factor to preserve version variation through a diversity-aware autoscaling framework. The approach is implemented in a Kubernetes setting with a two-component system (Load Balancer and Scaling Engine) and validated via Chaos Mesh experiments on the Online Boutique application, demonstrating improvements in reliability and performance across varying workloads. The work offers practical mechanisms for running multi-version containers with adaptive load balancing and autoscaling, contributing to more resilient and scalable microservice architectures in practice.

Abstract

In the ever-shifting landscape of software engineering, we recognize the need for adaptation and evolution to maintain system dependability. As each software iteration potentially introduces new challenges, from unforeseen bugs to performance anomalies, it becomes paramount to understand and address these intricacies to ensure robust system operations during the lifetime. This work proposes employing software diversity to enhance system reliability and performance simultaneously. A cornerstone of our work is the derivation of a reliability metric. This metric encapsulates the reliability and performance of each software version under adverse conditions. Using the calculated reliability score, we implemented a dynamic controller responsible for adjusting the population of each software version. The goal is to maintain a higher replica count for more reliable versions while preserving the diversity of versions as much as possible. This balance is crucial for ensuring not only the reliability but also the performance of the system against a spectrum of potential failures. In addition, we designed and implemented a diversity-aware autoscaling algorithm that maintains the reliability and performance of the system at the same time and at any scale. Our extensive experiments on realistic cloud microservice-based applications show the effectiveness of the proposed approach in this paper in promoting both reliability and performance.
Paper Structure (40 sections, 3 equations, 11 figures, 3 algorithms)

This paper contains 40 sections, 3 equations, 11 figures, 3 algorithms.

Figures (11)

  • Figure 1: System architecture. The diagram illustrates the structure of the proposed solution, detailing component interactions and data flow paths.
  • Figure 2: This visualization showcases the layout and interconnections of various microservices in the Online Boutique application.
  • Figure 3: Number of users. This chart presents the number of users accessing the system during the first experiment.
  • Figure 4: Application's response time during the first experiment.
  • Figure 5: Restart count of frontend microservice versions. This chart illustrates the frequency and patterns of system restarts over a specific period.
  • ...and 6 more figures