Table of Contents
Fetching ...

PORCA: Root Cause Analysis with Partially Observed Data

Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi

TL;DR

PORCA addresses Root Cause Analysis under partial observation by explicitly modeling unobserved confounders and unobserved heterogeneity. It introduces Magnified Score-Based Causal Discovery to learn a magnified ADMG, a Heterogeneity-Aware Scheduling mechanism to adaptively weight samples, and a Deconfounded Root Cause Localization process to rank root causes via a deconfounded random walk and anomaly-based scoring. Theoretical results establish identifiability of the magnified ADMG and optimality properties of the scheduling scheme, while experiments on synthetic and real-world data (CRACs, SWaT) show PORCA outperforms baselines in RCA and causal discovery, with robust performance under varying degrees of missingness and heterogeneity. The work offers a practical, scalable solution for reliable fault localization in complex, partially observed systems, with implications for AIOps and industrial monitoring.

Abstract

Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.

PORCA: Root Cause Analysis with Partially Observed Data

TL;DR

PORCA addresses Root Cause Analysis under partial observation by explicitly modeling unobserved confounders and unobserved heterogeneity. It introduces Magnified Score-Based Causal Discovery to learn a magnified ADMG, a Heterogeneity-Aware Scheduling mechanism to adaptively weight samples, and a Deconfounded Root Cause Localization process to rank root causes via a deconfounded random walk and anomaly-based scoring. Theoretical results establish identifiability of the magnified ADMG and optimality properties of the scheduling scheme, while experiments on synthetic and real-world data (CRACs, SWaT) show PORCA outperforms baselines in RCA and causal discovery, with robust performance under varying degrees of missingness and heterogeneity. The work offers a practical, scalable solution for reliable fault localization in complex, partially observed systems, with implications for AIOps and industrial monitoring.

Abstract

Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.
Paper Structure (31 sections, 18 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 18 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The motivation of PORCA. (a) shows that the physical world is partially observed, for example, micro actuators and latent malfunctions are neglected. (b) illustrates spurious edges caused by unobserved confounders and unobserved heterogeneity in orange and red, respectively.
  • Figure 2: The overview of PORCA.
  • Figure 3: Robustness of PORCA.
  • Figure 4: Parameter analysis of PORCA.
  • Figure 5: Learned causal structures from SWaT dataset. Subgraphs for stage 3 are illustrated.
  • ...and 1 more figures