Table of Contents
Fetching ...

Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?

Luan Pham, Huong Ha, Hongyu Zhang

TL;DR

A comprehensive evaluation of nine causal discovery methods and twenty-one root cause analysis methods for microservice systems indicates that no method stands out in all situations; each method tends to either fall short in effectiveness, efficiency, or shows sensitivity to specific parameters.

Abstract

Microservice architecture has become a popular architecture adopted by many cloud applications. However, identifying the root cause of a failure in microservice systems is still a challenging and time-consuming task. In recent years, researchers have introduced various causal inference-based root cause analysis methods to assist engineers in identifying the root causes. To gain a better understanding of the current status of causal inference-based root cause analysis techniques for microservice systems, we conduct a comprehensive evaluation of nine causal discovery methods and twenty-one root cause analysis methods. Our evaluation aims to understand both the effectiveness and efficiency of causal inference-based root cause analysis methods, as well as other factors that affect their performance. Our experimental results and analyses indicate that no method stands out in all situations; each method tends to either fall short in effectiveness, efficiency, or shows sensitivity to specific parameters. Notably, the performance of root cause analysis methods on synthetic datasets may not accurately reflect their performance in real systems. Indeed, there is still a large room for further improvement. Furthermore, we also suggest possible future work based on our findings.

Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?

TL;DR

A comprehensive evaluation of nine causal discovery methods and twenty-one root cause analysis methods for microservice systems indicates that no method stands out in all situations; each method tends to either fall short in effectiveness, efficiency, or shows sensitivity to specific parameters.

Abstract

Microservice architecture has become a popular architecture adopted by many cloud applications. However, identifying the root cause of a failure in microservice systems is still a challenging and time-consuming task. In recent years, researchers have introduced various causal inference-based root cause analysis methods to assist engineers in identifying the root causes. To gain a better understanding of the current status of causal inference-based root cause analysis techniques for microservice systems, we conduct a comprehensive evaluation of nine causal discovery methods and twenty-one root cause analysis methods. Our evaluation aims to understand both the effectiveness and efficiency of causal inference-based root cause analysis methods, as well as other factors that affect their performance. Our experimental results and analyses indicate that no method stands out in all situations; each method tends to either fall short in effectiveness, efficiency, or shows sensitivity to specific parameters. Notably, the performance of root cause analysis methods on synthetic datasets may not accurately reflect their performance in real systems. Indeed, there is still a large room for further improvement. Furthermore, we also suggest possible future work based on our findings.
Paper Structure (36 sections, 2 equations, 4 figures, 6 tables)

This paper contains 36 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the causal inference-based root cause analysis for microservice systems using metrics data.
  • Figure 2: Overview of our setup for microservice systems.
  • Figure 3: Performance of seven causal discovery methods on six synthetic datasets with different data lengths.
  • Figure 4: Performance of fourteen RCA methods on eight datasets with different data lengths.