Table of Contents
Fetching ...

Mechanism Learning: reverse causal inference in the presence of multiple unknown confounding through causally weighted Gaussian mixture models

Jianqiao Mao, Max A. Little

TL;DR

The paper addresses the problem of ML models learning spurious associations under unknown confounding by exploiting the front-door causal structure. It introduces mechanism learning, which uses causally weighted Gaussian Mixture Models (CW-GMMs) to approximate the interventional distribution $p\left(\left.x\right|do\left(y\right)\right)$ and generate deconfounded training samples without interventional data. By resampling with front-door weights and training standard predictors on these samples, the method achieves reverse causal inference that is robust to multiple unmeasured confounders. Empirical results across fully synthetic, semi-synthetic, and real-world ICH CT data show mechanism learning consistently reduces causal bias and outperforms a causal bootstrapping baseline, underscoring its potential for reliable, high-stakes ML applications. The approach is practical, scalable, and supported by an encoder-decoder mechanism embedding for high-dimensional mediators, making it applicable to diverse domains beyond healthcare.

Abstract

A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables. In high-stakes automation applications of ML this is problematic, as the model often learns spurious, non-causal associations. This paper proposes mechanism learning, a simple method which uses causally weighted Gaussian Mixture Models (CW-GMMs) to deconfound observational data such that any appropriate ML model is forced to learn predictive relationships between effects and their causes (reverse causal inference), despite the potential presence of multiple unknown and unmeasured confounding. Effect variables can be very high-dimensional, and the predictive relationship nonlinear, as is common in ML applications. This novel method is widely applicable, the only requirement is the existence of a set of mechanism variables mediating the cause (prediction target) and effect (feature data), which is independent of the (unmeasured) confounding variables. We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors where by contrast, the same ML predictor trained naively using classical supervised learning on the original observational data, is heavily biased by spurious associations. We provide code to implement the results in the paper, online.

Mechanism Learning: reverse causal inference in the presence of multiple unknown confounding through causally weighted Gaussian mixture models

TL;DR

The paper addresses the problem of ML models learning spurious associations under unknown confounding by exploiting the front-door causal structure. It introduces mechanism learning, which uses causally weighted Gaussian Mixture Models (CW-GMMs) to approximate the interventional distribution and generate deconfounded training samples without interventional data. By resampling with front-door weights and training standard predictors on these samples, the method achieves reverse causal inference that is robust to multiple unmeasured confounders. Empirical results across fully synthetic, semi-synthetic, and real-world ICH CT data show mechanism learning consistently reduces causal bias and outperforms a causal bootstrapping baseline, underscoring its potential for reliable, high-stakes ML applications. The approach is practical, scalable, and supported by an encoder-decoder mechanism embedding for high-dimensional mediators, making it applicable to diverse domains beyond healthcare.

Abstract

A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables. In high-stakes automation applications of ML this is problematic, as the model often learns spurious, non-causal associations. This paper proposes mechanism learning, a simple method which uses causally weighted Gaussian Mixture Models (CW-GMMs) to deconfound observational data such that any appropriate ML model is forced to learn predictive relationships between effects and their causes (reverse causal inference), despite the potential presence of multiple unknown and unmeasured confounding. Effect variables can be very high-dimensional, and the predictive relationship nonlinear, as is common in ML applications. This novel method is widely applicable, the only requirement is the existence of a set of mechanism variables mediating the cause (prediction target) and effect (feature data), which is independent of the (unmeasured) confounding variables. We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors where by contrast, the same ML predictor trained naively using classical supervised learning on the original observational data, is heavily biased by spurious associations. We provide code to implement the results in the paper, online.

Paper Structure

This paper contains 19 sections, 16 equations, 7 figures, 1 table, 2 algorithms.

Figures (7)

  • Figure 1: Mechanism learning (b) is a novel, simple and widely applicable solution to the problem of reverse causal inference in the presence of multiple unknown confounding, using arbitrary supervised ML algorithms to predict nonlinear effect-cause relationships from potentially high-dimensional effects. The causal scenario is represented by the ubiquitous front-door causal graph (a). There are multiple, unmeasured/unknown confounding paths between $Y$ and $X$ (bi-directed, dashed arrow). The classic causal inference direction is the causal path from $Y$ to $X$ via $Z$ (blue half arrow); reverse causal inference infers causes $Y$ from effects $X$ (red half arrow).
  • Figure 2: Example digits from the confounded (a) and non-confounded (b) background-MNIST datasets. In (a), background brightness is manipulated so that it is a confounding factor with digit class (e.g., "6" is brighter than "2"); in (b), the brightness-digit association is randomized to simulate a controlled setting.
  • Figure 3: Comparison of classification (a-d) and regression (e-g) models trained using mechanism learning (a, e), CB-based deconfounding (b, f), and classical supervised learning trained on confounded data set (c, g) and on non-confounded data set (d). In classification (a–d), the confounded model (c) yields a skewed decision boundary, which is a mixture of the confounder (orange line) the true class boundary (black line). Nevertheless, mechanism learning-based deconfounded SVM (a) and CB-based deconfounded SVM (b) produce boundaries aligned with the true class separation, which are closer to the non-confounded SVM boundary shown in (d). The samples generated by CW-GMMs show better sample diversity. In regression (e–g), the classical model (g) shows a biased slope due to latent confounding, while mechanism learning (e) and CB-based deconfounding (f) recover the non-confounded regression line (black lines).
  • Figure 4: Comparison of the fitted CW-GMMs with different numbers of components $K$ (2, 4, 6, 8 and 10 Gaussian components, respectively) for the synthetic classification task, together with the deconfounded data generated correspondingly. For each intervention, a CW-GMM is fitted. The mark size of the component mean is proportional to the mixture coefficient $\pi_k$, and the higher probability density area is reflected by a darker color.
  • Figure 5: Front-door structural causal model for the real-world ICH dataset hssayeni2020computed, for the purposes of mechanism learning. The cause variable $Y$ represents diagnostic category; mechanism variable $Z$ represents hemorrhage region label, and the effect variable $X$ are the digital CT scans.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 2.1: Front-door criterion pearl2009causality