Table of Contents
Fetching ...

Modeling and Discovering Direct Causes for Predictive Models

Yizuo Chen, Amit Bhatia

TL;DR

The paper tackles identifying which input features directly drive predictions from black-box models by embedding predictive models into causal graphs (ADMGs). It shows that the direct causes correspond to the parents of $Y$ in a predictive graph, and under canonicity or weak faithfulness these sources coincide with the Markov boundary $\mathrm{MB}(Y)$, enabling discovery from observational data. The authors provide sound and complete algorithms for identifying direct causes via Markov-boundary discovery and introduce an independence-rule optimization (I-decomposability) to accelerate discovery, with theoretical guarantees. Empirical results demonstrate substantial reductions in computation and independence tests, particularly when the predictive graph has many direct causes, underscoring practical benefits for explainability and efficient data collection in complex predictive systems.

Abstract

We introduce a causal modeling framework that captures the input-output behavior of predictive models (e.g., machine learning models). The framework enables us to identify features that directly cause the predictions, which has broad implications for data collection and model evaluation. We then present sound and complete algorithms for discovering direct causes (from data) under some assumptions. Furthermore, we propose a novel independence rule that can be integrated with the algorithms to accelerate the discovery process, as we demonstrate both theoretically and empirically.

Modeling and Discovering Direct Causes for Predictive Models

TL;DR

The paper tackles identifying which input features directly drive predictions from black-box models by embedding predictive models into causal graphs (ADMGs). It shows that the direct causes correspond to the parents of in a predictive graph, and under canonicity or weak faithfulness these sources coincide with the Markov boundary , enabling discovery from observational data. The authors provide sound and complete algorithms for identifying direct causes via Markov-boundary discovery and introduce an independence-rule optimization (I-decomposability) to accelerate discovery, with theoretical guarantees. Empirical results demonstrate substantial reductions in computation and independence tests, particularly when the predictive graph has many direct causes, underscoring practical benefits for explainability and efficient data collection in complex predictive systems.

Abstract

We introduce a causal modeling framework that captures the input-output behavior of predictive models (e.g., machine learning models). The framework enables us to identify features that directly cause the predictions, which has broad implications for data collection and model evaluation. We then present sound and complete algorithms for discovering direct causes (from data) under some assumptions. Furthermore, we propose a novel independence rule that can be integrated with the algorithms to accelerate the discovery process, as we demonstrate both theoretically and empirically.

Paper Structure

This paper contains 13 sections, 7 theorems, 3 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Proposition 6

Let $G({\mathbf X},Y)$ be a predictive graph that induces a distribution $\Pr$ where $\Pr({\mathbf X}) > 0.$The positivity assumption ensures $\Pr(Y | {\mathbf X})$ is well-defined. Then $X \in {\mathbf X}$ is a direct cause of $Y$ by Definition def:direct-cause iff $\overline{{\mathcal{I}}_{\Pr}}(X

Figures (5)

  • Figure 1: $G$ depicts the conventional causal graph over a patient's age ($A$), disease ($D$), symptom ($S$) and prescription ($P$), whereas $G'$ depicts the causal graph for the prediction of $S$ from $A, D, P.$
  • Figure 2:
  • Figure 3: Causal graphs to illustrate different assumptions.
  • Figure 4: Examples of predictive graphs.
  • Figure 5: Accuracy of algorithms for identifying direct causes under various sample sizes.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Proposition 6
  • Definition 7
  • Theorem 8
  • Definition 9
  • Theorem 10
  • ...and 11 more