Table of Contents
Fetching ...

Invariant Causal Set Covering Machines

Thibaud Godon, Baptiste Bauvin, Pascal Germain, Jacques Corbeil, Alexandre Drouin

TL;DR

Rule-based predictors are interpretable but prone to spurious associations. The paper introduces Invariant Causal Set Covering Machines (ICSCM), which extends Set Covering Machines by leveraging invariances across multiple environments to identify the causal parents of a target variable in polynomial time. It establishes theoretical construction criteria and a pruning procedure to guarantee recovery of the causal parent set while excluding noncausal variables, and demonstrates both simulated and real-world advantages over standard SCM (and competitive baselines). The work advances robust, interpretable causal discovery in high-dimensional, multi-environment data, with practical implications for biomarker discovery and mechanistic understanding in complex systems.

Abstract

Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.

Invariant Causal Set Covering Machines

TL;DR

Rule-based predictors are interpretable but prone to spurious associations. The paper introduces Invariant Causal Set Covering Machines (ICSCM), which extends Set Covering Machines by leveraging invariances across multiple environments to identify the causal parents of a target variable in polynomial time. It establishes theoretical construction criteria and a pruning procedure to guarantee recovery of the causal parent set while excluding noncausal variables, and demonstrates both simulated and real-world advantages over standard SCM (and competitive baselines). The work advances robust, interpretable causal discovery in high-dimensional, multi-environment data, with practical implications for biomarker discovery and mechanistic understanding in complex systems.

Abstract

Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.
Paper Structure (25 sections, 2 theorems, 16 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 25 sections, 2 theorems, 16 equations, 7 figures, 5 tables, 2 algorithms.

Key Result

Theorem 3.1

(Model construction criteria) Assume that the data-generating process follows the causal graph depicted at fig:causal-graph and that Assumptions ass:markov, ass:faith, ass:envs, and ass:conj hold. Let with $\mathbf{X}^\star \subseteq \mathbf{X}$, be an arbitrary conjunction of $d$ binary-valued rules. Without loss of generality, assume an arbitrary ordering of the rules $1 \ldots d$ and consider

Figures (7)

  • Figure 1: Graphical assumptions: the edge between $\mathbf{X}_A$ and $\mathbf{X}_B$ can be oriented in either way, but the resulting $G$ must be a Directed Acyclic Graph (DAG). Dashed edges are optional.
  • Figure 2: Tree-based representation of a $d$-rule conjunction. Positive and negative leaves are emphasized with $+$ and $-$, respectively.
  • Figure 3: Implicit variables for binary-valued rules introduced in the proof of \ref{['thm:leafs']}: this figure is an expanded version of the causal graph illustrated in \ref{['fig:causal-graph']} where the rules in the conjunction (Eq. \ref{['eq:conj-assumption']}) are represented as random variables that mediate all paths from $\mathbf{X}_A$ to $Y$.
  • Figure 4: Running time w.r.t. the size of $\mathbf{X}_B$ on simulated data.
  • Figure 5: Number of causal features discovered as a function of the model size (feature rank) for real-world datasets. The solid lines give the average causal feature count over $20$ repetitions, and the shaded areas report the standard errors. The gray dashed line indicates the expected number of causal variables selected by a random pick. The detailed behavior of each model is shown in supplementary Figure \ref{['fig:supp-detailed-real-world-exp']}.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • proof
  • Proposition 3.2
  • proof