SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling
Matthew J. Vowels
TL;DR
The paper tackles the challenge of estimating causal effects from observational data when the underlying functional form is unknown. It introduces SLEM, a framework that integrates DAG-based causal structure with Super Learner ensembles to estimate all path coefficients and arbitrary interventions through a DAG-aware, nonparametric approach. Through simulations and an IHDP benchmark, SLEM demonstrates unbiased, consistent estimation in linear settings and superior performance in nonlinear settings, while offering competitive results with state-of-the-art methods. The authors provide open-source code and tutorials to enable easy adoption of nonparametric causal inference from DAGs in applied research.
Abstract
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
