Invariant Causal Set Covering Machines
Thibaud Godon, Baptiste Bauvin, Pascal Germain, Jacques Corbeil, Alexandre Drouin
TL;DR
Rule-based predictors are interpretable but prone to spurious associations. The paper introduces Invariant Causal Set Covering Machines (ICSCM), which extends Set Covering Machines by leveraging invariances across multiple environments to identify the causal parents of a target variable in polynomial time. It establishes theoretical construction criteria and a pruning procedure to guarantee recovery of the causal parent set while excluding noncausal variables, and demonstrates both simulated and real-world advantages over standard SCM (and competitive baselines). The work advances robust, interpretable causal discovery in high-dimensional, multi-environment data, with practical implications for biomarker discovery and mechanistic understanding in complex systems.
Abstract
Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.
