Detecting and Ranking Causal Anomalies in End-to-End Complex System
Ching Chang, Wen-Chih Peng
TL;DR
This paper tackles the detection of causal anomalies in end-to-end, multivariate factory systems. It introduces RCAE2E, a framework that first builds state-aware profiles with TICC_GTC and then ranks causal anomalies with RCA_SCC, addressing both state diversity and the treatment of multiple time lags. The two main contributions are the TICC_GTC method for robust state-aware profiling and the RCA_SCC method for time-lag aware causal ranking, with additional steps to compare profiles and convert RCA results to the RCA_SCC form. Experimental validation on synthetic and real factory data shows that RCAE2E outperforms baselines in precision, recall, and ranking metrics, highlighting its practical impact for industrial monitoring and root-cause analysis.
Abstract
With the rapid development of technology, the automated monitoring systems of large-scale factories are becoming more and more important. By collecting a large amount of machine sensor data, we can have many ways to find anomalies. We believe that the real core value of an automated monitoring system is to identify and track the cause of the problem. The most famous method for finding causal anomalies is RCA, but there are many problems that cannot be ignored. They used the AutoRegressive eXogenous (ARX) model to create a time-invariant correlation network as a machine profile, and then use this profile to track the causal anomalies by means of a method called fault propagation. There are two major problems in describing the behavior of a machine by using the correlation network established by ARX: (1) It does not take into account the diversity of states (2) It does not separately consider the correlations with different time-lag. Based on these problems, we propose a framework called Ranking Causal Anomalies in End-to-End System (RCAE2E), which completely solves the problems mentioned above. In the experimental part, we use synthetic data and real-world large-scale photoelectric factory data to verify the correctness and existence of our method hypothesis.
