RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model
Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, Wenweu Zhu
TL;DR
This work tackles temporal causal discovery in industrial settings where interventional targets are unavailable and much of the system knowledge is encoded in text. It introduces RealTCD, a two-module framework combining a score-based temporal causal discovery method with an LLM-guided meta-initialization to inject domain knowledge from textual data. The optimization uses an augmented Lagrangian with learnable masks to handle unknown interventions and enforce acyclicity, enabling joint learning of the temporal graph and intervention targets. Across synthetic SVAR-like data and a real data-center dataset, RealTCD outperforms strong baselines on both structural metrics and domain-relevant causal relations, demonstrating practical potential for root-cause analysis, anomaly detection, and IT operations optimization.
Abstract
In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.
