Causal Deep Learning

Jeroen Berrevoets; Krzysztof Kacprzyk; Zhaozhi Qian; Mihaela van der Schaar

Causal Deep Learning

Jeroen Berrevoets, Krzysztof Kacprzyk, Zhaozhi Qian, Mihaela van der Schaar

TL;DR

Causal Deep Learning (CDL) proposes a pragmatic framework to integrate causality with deep learning by explicitly modeling partial causal knowledge, functional forms, and temporal dynamics. It introduces a three-dimensional CDL scale—structural, parametric, and temporal—to organize assumptions and guide model pipelines, including transitions between knowledge states and the construction of cascaded models. The paper uses treatment effects, exemplified by IHDP, to illustrate how a CDL perspective separates a priori causal structure from learned posteriors and clarifies which aspects are testable. By providing a topology to map methods, pipelines, and data requirements, CDL aims to increase real-world impact in domains like healthcare, economics, environment, and education while offering practical guidelines for reporting assumptions and results.

Abstract

Causality has the potential to truly transform the way we solve a large number of real-world problems. Yet, so far, its potential largely remains to be unlocked as causality often requires crucial assumptions which cannot be tested in practice. To address this challenge, we propose a new way of thinking about causality -- we call this causal deep learning. Our causal deep learning framework spans three dimensions: (1) a structural dimension, which incorporates partial yet testable causal knowledge rather than assuming either complete or no causal knowledge among the variables of interest; (2) a parametric dimension, which encompasses parametric forms that capture the type of relationships among the variables of interest; and (3) a temporal dimension, which captures exposure times or how the variables of interest interact (possibly causally) over time. Causal deep learning enables us to make progress on a variety of real-world problems by leveraging partial causal knowledge (including independencies among variables) and quantitatively characterising causal relationships among variables of interest (possibly over time). Our framework clearly identifies which assumptions are testable and which ones are not, such that the resulting solutions can be judiciously adopted in practice. Using our formulation we can combine or chain together causal representations to solve specific problems without losing track of which assumptions are required to build these solutions, pushing real-world impact in healthcare, economics and business, environmental sciences and education, through causal deep learning.

Causal Deep Learning

TL;DR

Abstract

Paper Structure (23 sections, 9 equations, 11 figures, 2 tables)

This paper contains 23 sections, 9 equations, 11 figures, 2 tables.

Introduction
CDL in 3 dimensions: Structural, Parametric, and Temporal
Preamble: distribution factorization
The structural scale and rung 1.5
The parametric scale
The temporal scale
Characterizing CDL models using the scale
Constructing and testing model pipelines
Transition in the knowledge
Building CDL model pipelines
Non-matching assumptions
Treatment effects: An illustrative example using CDL
Treatment effects: the CDL description
Conclusion and guidelines for CDL papers
Real-world applications
...and 8 more sections

Figures (11)

Figure 1: The structural scale. The first axis on the map of causal deep learning is the structural scale which categorises different structure types. The two extremes are non-informative structures or a complete causal graph. Note that the structural scale makes no parametric assumptions on these structures (parametric assumptions are the focus of the parametric scale in \ref{['fig:param']}).
Figure 2: The parametric scale. A second axis in the map of causal deep learning. The parametric scale logs the type of assumptions made on the factors of the assumed distribution or model. The extremes are no assumptions at all (leaving completely non-parametric factors), or a fully known model. The parametric scale further discerns assumptions on the way noise interacts in the system (${\epsilon}$) and the functional shape of the system's factorisation.
Figure 3: Confounding over time. We borrow the above DAG from bica2020estimating in two time steps robins2000marginal. This figure shows how time complicates matters. Specifically, because of time, the same variable ($X$) has different causal edges associated as a function of $t$. See for example how $X_{t=1}$ causes $U_{t=1}$, but not $U_{t=2}$.
Figure 4: Combining axes. We can combine our axes to evaluate a model's input and output (or representation). As an example, we show a fictitious method which assumes no structure but does make an additive noise assumption; operates in the temporal domain; and provides a truly causal representation under the same additive noise assumption. Doing so allows easy examination of a method on many properties at once.
Figure 5: Building pipelines. As the input and representation in our map (\ref{['fig:map:annotated']}) is defined on the same domain-- the structural scale $\times$ the parametric scale --we can use the representation of a model as input for a subsequent model. Documenting these model combinations (i.e. pipelines) is easy with our scales as we can simply cascade multiple diagrams after one another as we have done above. Doing so shows us that, to truly generate fair data, one has to be willing to make either strong assumptions (such as having access to causal DAGs, or the necessary assumptions to discover them), or use alternative models in their pipeline that make testable assumptions.
...and 6 more figures

Theorems & Definitions (2)

Definition 1: factor
Definition 2: Transition

Causal Deep Learning

TL;DR

Abstract

Causal Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (2)