PCM Selector: Penalized Covariate-Mediator Selection Operator for Evaluating Linear Causal Effects
Hisayoshi Nanmo, Manabu Kuroki
TL;DR
This work tackles the challenge of estimating total causal effects in linear structural causal models when the back-door confounder set is unavailable or when high-dimensional data induce multicollinearity. It introduces PCM Selector, a two-stage penalized regression framework that jointly models the outcome and intermediate variables while adaptively selecting covariates and mediators via $L_p$ penalties, yielding more accurate and less biased estimates of the total effect $\tau_{yx}$ than standard methods. The approach generalizes PAL$_p$MA, leverages front-door-like criteria when necessary, and provides a bias-reducing variant for $p=2$; theoretical results and numerical experiments support improved estimation and reliable sign recovery in challenging settings. The method offers practical impact for causal inference in high-dimensional data, enabling robust evaluation of intervention effects even when confounding structure is partially unobserved or when intermediate variables play a crucial role in identification.
Abstract
For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a set can be observed, but standard statistical estimation methods cannot be applied to estimate causal effects because of multicollinearity/high-dimensional data problems. We propose a novel two-stage penalized regression approach, the penalized covariate-mediator selection operator (PCM Selector), to estimate the causal effects in such scenarios. Unlike existing penalized regression analyses, when a set of intermediate variables is available, PCM Selector provides a consistent or less biased estimator of the causal effect. In addition, PCM Selector provides a variable selection procedure for intermediate variables to obtain better estimation accuracy of the causal effects than does the back-door criterion.
