Table of Contents
Fetching ...

What If: Causal Analysis with Graph Databases

Amedeo Pachera, Mattia Palmiotto, Angela Bonifati, Andrea Mauri

TL;DR

This work tackles the problem of making graph databases causally aware by proposing a Causal Direct Acyclic Hypergraph (cDAH) that embeds causal semantics into property graphs. It formalizes causal graphs with structural equations and do-calculus, and introduces hypernodes to encode subgraphs and causal variables, enabling integrated extraction, maintenance, and transport of causal knowledge within graph databases. The paper also extends graph query capabilities with constructs like EXTRACT and PROBABILITY and demonstrates a three-level causal analysis (Association, Intervention, Counterfactual) implemented through path-based queries. It further discusses how to transport and merge cDAGs across heterogeneous data sources and outlines maintenance strategies using graph triggers and incremental updates. The proposed framework aims to support data-driven personalized decision making by unifying causal inference with graph data management in diverse scientific domains, while highlighting open challenges in scalability, DSL design, and integration with advanced AI techniques.

Abstract

Graphs are expressive abstractions representing more effectively relationships in data and enabling data science tasks. They are also a widely adopted paradigm in causal inference focusing on causal directed acyclic graphs. Causal DAGs (Directed Acyclic Graphs) are manually curated by domain experts, but they are never validated, stored and integrated as data artifacts in a graph data management system. In this paper, we delineate our vision to align these two paradigms, namely causal analysis and property graphs, the latter being the cornerstone of modern graph databases. To articulate this vision, a paradigm shift is required leading to rethinking property graph data models with hypernodes and structural equations, graph query semantics and query constructs, and the definition of graph views to account for causality operators. Moreover, several research problems and challenges arise aiming at automatically extracting causal models from the underlying graph observational data, aligning and integrating disparate causal graph models into unified ones along with their maintenance upon the changes in the underlying data. The above vision will allow to make graph databases aware of causal knowledge and pave the way to data-driven personalized decision-making in several scientific fields.

What If: Causal Analysis with Graph Databases

TL;DR

This work tackles the problem of making graph databases causally aware by proposing a Causal Direct Acyclic Hypergraph (cDAH) that embeds causal semantics into property graphs. It formalizes causal graphs with structural equations and do-calculus, and introduces hypernodes to encode subgraphs and causal variables, enabling integrated extraction, maintenance, and transport of causal knowledge within graph databases. The paper also extends graph query capabilities with constructs like EXTRACT and PROBABILITY and demonstrates a three-level causal analysis (Association, Intervention, Counterfactual) implemented through path-based queries. It further discusses how to transport and merge cDAGs across heterogeneous data sources and outlines maintenance strategies using graph triggers and incremental updates. The proposed framework aims to support data-driven personalized decision making by unifying causal inference with graph data management in diverse scientific domains, while highlighting open challenges in scalability, DSL design, and integration with advanced AI techniques.

Abstract

Graphs are expressive abstractions representing more effectively relationships in data and enabling data science tasks. They are also a widely adopted paradigm in causal inference focusing on causal directed acyclic graphs. Causal DAGs (Directed Acyclic Graphs) are manually curated by domain experts, but they are never validated, stored and integrated as data artifacts in a graph data management system. In this paper, we delineate our vision to align these two paradigms, namely causal analysis and property graphs, the latter being the cornerstone of modern graph databases. To articulate this vision, a paradigm shift is required leading to rethinking property graph data models with hypernodes and structural equations, graph query semantics and query constructs, and the definition of graph views to account for causality operators. Moreover, several research problems and challenges arise aiming at automatically extracting causal models from the underlying graph observational data, aligning and integrating disparate causal graph models into unified ones along with their maintenance upon the changes in the underlying data. The above vision will allow to make graph databases aware of causal knowledge and pave the way to data-driven personalized decision-making in several scientific fields.

Paper Structure

This paper contains 13 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: Path extraction from causal DAGs and graph databases.
  • Figure 2: DAGs for conditional independence.
  • Figure 3: SCM without (a) and with (b) intervention.
  • Figure 4: Example of a Causal Direct Acyclic Hypergraph model (variables of the causal DAG, while the property graph data is in white).
  • Figure 5: Example of merging of different nodes representing the same causal variable.
  • ...and 1 more figures