SoK: Microservice Architectures from a Dependability Perspective
Dāvis Kažemaks, Jérémie Decouchant
TL;DR
This paper surveys runtime faults and vulnerabilities in microservice architectures, emphasizing runtime detection and recovery over offline mitigation. It systematically maps literature from 2019–2024, using a structured methodology and threat-modeling lens to classify faults into performance, architecture, component, and security categories. The meta-analysis shows rapid growth with a stronger focus on detection than recovery, and highlights the promise of multi-modal, graph-based localization approaches, while noting gaps in service registry/monitoring fault coverage and the lack of standardized datasets. The work provides a practical overview for risk analysis and a roadmap for future empirical comparisons and comprehensive runtime resilience, guiding both researchers and practitioners toward more robust MSAs.
Abstract
The microservice software architecture leverages the idea of splitting large monolithic applications into multiple smaller services that interact using lightweight communication schemes. While the microservice architecture has proven its ability to support modern business applications, it also introduces new possible weak points in a system. Some scientific literature surveys have already addressed fault tolerance or security concerns but most of them lack analysis on the fault and vulnerability coverage that is introduced by microservice architectures. We explore the known faults and vulnerabilities that microservice architecture might suffer from, and the recent scientific literature that addresses them. We emphasize runtime detection and recovery mechanisms instead of offline prevention and mitigation mechanisms to limit the scope of this document.
