CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series

Luca Castri; Sariah Mghames; Marc Hanheide; Nicola Bellotto

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series

Luca Castri, Sariah Mghames, Marc Hanheide, Nicola Bellotto

TL;DR

CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time‐series data, is proposed and demonstrated that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis.

Abstract

The study of cause-and-effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This paper proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time-series data. The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well-known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub: https://github.com/lcastri/causalflow.

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series

TL;DR

Abstract

Paper Structure (20 sections, 6 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 6 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Observation-based Causal Discovery
Observation and Intervention-based Causal Discovery
Causal Discovery Based on Observational and Interventional Data
LPCMCI
Interventions through Context Variables
Faithfulness Assumption
CAnDOIT Algorithm
Evaluation on Random Synthetic Models
Random Synthetic Models
Evaluation Setting
Evaluation Metrics
Experimental Results on Synthetic Models
Evaluation on Robotic Scenario
...and 5 more sections

Figures (10)

Figure 1: CAnDOIT effectively employs context variables to handle observational and interventional data, resulting in a unified causal structure (right) that accommodates both types of data. In contrast, analysing these data separately leads to different causal structures for observations (left) and interventions (center).
Figure 2: CAnDOIT’s block scheme representation. CAnDOIT processes observational and interventional data; the context block adds context variables ($C\!Z$) linked to the actual intervention variable ($Z$) with an instantaneous link ($C\!Z \rightarrow Z$); Finally, the LPCMCI block finalizes the causal discovery process.
Figure 3: Causal models randomly generated with their corresponding systems of equations on the bottom. (a) An example of a linear system for the $S_1$ evaluation strategy, which has no hidden confounders; (b) A random linear system for the $S_2$ evaluation strategy, including two hidden confounders ($H_0,~H_1$); (c) An example of a five-variable random linear system with two hidden confounders ($H_0,~H_1$) for the $S_3$ evaluation strategy. (d) and (e) present nonlinear counterparts of (b) and (c), respectively. For ease of reading, we present only examples with a maximum time lag of 3, 8 observable variables, and 2 hidden confounders. However, in the $S_1,~S_2,~S_4$ evaluation strategies, the maximum number of variables was 12, and the maximum number of hidden confounders was 3.
Figure 4: LPCMCI (red dotted line), CAnDOIT_mean (green dashed line) and CAnDOIT_best (blue) in $S_1$ analysis: linear systems with a number of observable variables ranging from 5 to 12 and no hidden confounders. (a) False Positive Rate (FPR); (b) Structural Hamming Distance (SHD); (c) Uncertainty; (d) $F_1$-Score; (e) PAG Size (reported in logarithmic scale); (f) Time (expressed in seconds).
Figure 5: LPCMCI (red dotted line), CAnDOIT_mean (green dashed line) and CAnDOIT_best (blue) in $S_2$ analysis: linear systems with a number of observable variables ranging from 5 to 12 and a random number of hidden confounders (from 1 to 3). (a) False Positive Rate (FPR); (b) Structural Hamming Distance (SHD); (c) Uncertainty; (d) $F_1$-Score; (e) PAG Size (reported in logarithmic scale); (f) Time (expressed in seconds).
...and 5 more figures

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series

TL;DR

Abstract

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series

Authors

TL;DR

Abstract

Table of Contents

Figures (10)