Table of Contents
Fetching ...

PCM Selector: Penalized Covariate-Mediator Selection Operator for Evaluating Linear Causal Effects

Hisayoshi Nanmo, Manabu Kuroki

TL;DR

This work tackles the challenge of estimating total causal effects in linear structural causal models when the back-door confounder set is unavailable or when high-dimensional data induce multicollinearity. It introduces PCM Selector, a two-stage penalized regression framework that jointly models the outcome and intermediate variables while adaptively selecting covariates and mediators via $L_p$ penalties, yielding more accurate and less biased estimates of the total effect $\tau_{yx}$ than standard methods. The approach generalizes PAL$_p$MA, leverages front-door-like criteria when necessary, and provides a bias-reducing variant for $p=2$; theoretical results and numerical experiments support improved estimation and reliable sign recovery in challenging settings. The method offers practical impact for causal inference in high-dimensional data, enabling robust evaluation of intervention effects even when confounding structure is partially unobserved or when intermediate variables play a crucial role in identification.

Abstract

For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a set can be observed, but standard statistical estimation methods cannot be applied to estimate causal effects because of multicollinearity/high-dimensional data problems. We propose a novel two-stage penalized regression approach, the penalized covariate-mediator selection operator (PCM Selector), to estimate the causal effects in such scenarios. Unlike existing penalized regression analyses, when a set of intermediate variables is available, PCM Selector provides a consistent or less biased estimator of the causal effect. In addition, PCM Selector provides a variable selection procedure for intermediate variables to obtain better estimation accuracy of the causal effects than does the back-door criterion.

PCM Selector: Penalized Covariate-Mediator Selection Operator for Evaluating Linear Causal Effects

TL;DR

This work tackles the challenge of estimating total causal effects in linear structural causal models when the back-door confounder set is unavailable or when high-dimensional data induce multicollinearity. It introduces PCM Selector, a two-stage penalized regression framework that jointly models the outcome and intermediate variables while adaptively selecting covariates and mediators via penalties, yielding more accurate and less biased estimates of the total effect than standard methods. The approach generalizes PALMA, leverages front-door-like criteria when necessary, and provides a bias-reducing variant for ; theoretical results and numerical experiments support improved estimation and reliable sign recovery in challenging settings. The method offers practical impact for causal inference in high-dimensional data, enabling robust evaluation of intervention effects even when confounding structure is partially unobserved or when intermediate variables play a crucial role in identification.

Abstract

For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a set can be observed, but standard statistical estimation methods cannot be applied to estimate causal effects because of multicollinearity/high-dimensional data problems. We propose a novel two-stage penalized regression approach, the penalized covariate-mediator selection operator (PCM Selector), to estimate the causal effects in such scenarios. Unlike existing penalized regression analyses, when a set of intermediate variables is available, PCM Selector provides a consistent or less biased estimator of the causal effect. In addition, PCM Selector provides a variable selection procedure for intermediate variables to obtain better estimation accuracy of the causal effects than does the back-door criterion.

Paper Structure

This paper contains 10 sections, 2 theorems, 23 equations, 2 figures, 1 table.

Key Result

Theorem 1

For an active set $\hbox{\boldmath $M$}\cup \hbox{\boldmath $C$}$, when the OLS estimators are available, if $X$ is conditionally independent of $Y$ given $\hbox{\boldmath $M$}\cup \hbox{\boldmath $C$}$, then the following inequalities approximately hold under the normality: for the optimal tuning and penalty parameters.

Figures (2)

  • Figure 1: Causal diagram. The thick red arrows show the total effect of interest. $X$: treatment variable; $Y$: response variable; $S$: intermediate variable that can be selected using prior causal knowledge; $\overline{\hbox{\boldmath $S$}}=\{\overline{S}_{1},\ldots,\overline{S}_{5}\}$: a set of intermediate variables for which it is uncertain which element should be added to evaluate the total effects; $Z$: covariate that can be selected using prior causal knowledge; $\overline{\hbox{\boldmath $Z$}}=\{\overline{Z}_{1},\ldots,\overline{Z}_{10}\}$: a set of covariates for which it is uncertain which element should be added to evaluate the total effects.
  • Figure 2: Violin plots of estimated total effects. The dashed lines show the true total effects. FDL: Front-door-like.

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2