Table of Contents
Fetching ...

RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data

Luan Pham, Hongyu Zhang, Huong Ha, Flora Salim, Xiuzhen Zhang

TL;DR

RCAEval tackles the lack of standard benchmarks for root cause analysis in microservice systems by providing three real-world datasets totaling 735 failure cases across 11 fault types and an open-source evaluation framework with 15 baselines. It supports metric-, trace-, and multi-source RCA using multi-source telemetry (metrics, logs, traces) from diverse microservice systems, including code-level faults. Preliminary experiments on the Train Ticket data reveal only moderate baseline performance, with room for improvement and the need for holistic RCA methods. The benchmark is openly available to accelerate research and practical progress in robust RCA for complex microservice environments.

Abstract

Root cause analysis (RCA) for microservice systems has gained significant attention in recent years. However, there is still no standard benchmark that includes large-scale datasets and supports comprehensive evaluation environments. In this paper, we introduce RCAEval, an open-source benchmark that provides datasets and an evaluation environment for RCA in microservice systems. First, we introduce three comprehensive datasets comprising 735 failure cases collected from three microservice systems, covering various fault types observed in real-world failures. Second, we present a comprehensive evaluation framework that includes fifteen reproducible baselines covering a wide range of RCA approaches, with the ability to evaluate both coarse-grained and fine-grained RCA. We hope that this ready-to-use benchmark will enable researchers and practitioners to conduct extensive analysis and pave the way for robust new solutions for RCA of microservice systems.

RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data

TL;DR

RCAEval tackles the lack of standard benchmarks for root cause analysis in microservice systems by providing three real-world datasets totaling 735 failure cases across 11 fault types and an open-source evaluation framework with 15 baselines. It supports metric-, trace-, and multi-source RCA using multi-source telemetry (metrics, logs, traces) from diverse microservice systems, including code-level faults. Preliminary experiments on the Train Ticket data reveal only moderate baseline performance, with room for improvement and the need for holistic RCA methods. The benchmark is openly available to accelerate research and practical progress in robust RCA for complex microservice environments.

Abstract

Root cause analysis (RCA) for microservice systems has gained significant attention in recent years. However, there is still no standard benchmark that includes large-scale datasets and supports comprehensive evaluation environments. In this paper, we introduce RCAEval, an open-source benchmark that provides datasets and an evaluation environment for RCA in microservice systems. First, we introduce three comprehensive datasets comprising 735 failure cases collected from three microservice systems, covering various fault types observed in real-world failures. Second, we present a comprehensive evaluation framework that includes fifteen reproducible baselines covering a wide range of RCA approaches, with the ability to evaluate both coarse-grained and fine-grained RCA. We hope that this ready-to-use benchmark will enable researchers and practitioners to conduct extensive analysis and pave the way for robust new solutions for RCA of microservice systems.

Paper Structure

This paper contains 12 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Overview of the RCAEval benchmark.
  • Figure 2: Illustration of our data collection setup.