Table of Contents
Fetching ...

Multivariate Causal Effects: a Bayesian Causal Regression Factor Model

Dafne Zorzetto, Jenna Landy, Corwin Zigler, Giovanni Parmigiani, Roberta De Vito

Abstract

The impact of wildfire smoke on air quality is a growing concern, contributing to air pollution through a complex mixture of chemical species with important implications for public health. While previous studies have primarily focused on its association with total particulate matter (PM2.5), the causal relationship between wildfire smoke and the chemical composition of PM2.5 remains largely unexplored. Exposure to these chemical mixtures plays a critical role in shaping public health, yet capturing their relationships requires advanced statistical methods capable of modeling the complex dependencies among chemical species. To fill this gap, we propose a Bayesian causal regression factor model that estimates the multivariate causal effects of wildfire smoke on the concentration of 27 chemical species in PM2.5 across the United States. Our approach introduces two key innovations: (i) a causal inference framework for multivariate potential outcomes, and (ii) a novel Bayesian factor model that employs a probit stick-breaking process as prior for treatment-specific factor scores. By focusing on factor scores, our method addresses the missing data challenge common in causal inference and enables a flexible, data-driven characterization of the latent factor structure, which is crucial to capture the complex correlation among multivariate outcomes. Through Monte Carlo simulations, we show the model's accuracy in estimating the causal effects in multivariate outcomes and characterizing the treatment-specific latent structure. Finally, we apply our method to US air quality data, estimating the causal effect of wildfire smoke on 27 chemical species in PM2.5, providing a deeper understanding of their interdependencies.

Multivariate Causal Effects: a Bayesian Causal Regression Factor Model

Abstract

The impact of wildfire smoke on air quality is a growing concern, contributing to air pollution through a complex mixture of chemical species with important implications for public health. While previous studies have primarily focused on its association with total particulate matter (PM2.5), the causal relationship between wildfire smoke and the chemical composition of PM2.5 remains largely unexplored. Exposure to these chemical mixtures plays a critical role in shaping public health, yet capturing their relationships requires advanced statistical methods capable of modeling the complex dependencies among chemical species. To fill this gap, we propose a Bayesian causal regression factor model that estimates the multivariate causal effects of wildfire smoke on the concentration of 27 chemical species in PM2.5 across the United States. Our approach introduces two key innovations: (i) a causal inference framework for multivariate potential outcomes, and (ii) a novel Bayesian factor model that employs a probit stick-breaking process as prior for treatment-specific factor scores. By focusing on factor scores, our method addresses the missing data challenge common in causal inference and enables a flexible, data-driven characterization of the latent factor structure, which is crucial to capture the complex correlation among multivariate outcomes. Through Monte Carlo simulations, we show the model's accuracy in estimating the causal effects in multivariate outcomes and characterizing the treatment-specific latent structure. Finally, we apply our method to US air quality data, estimating the causal effect of wildfire smoke on 27 chemical species in PM2.5, providing a deeper understanding of their interdependencies.

Paper Structure

This paper contains 27 sections, 26 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Graphical representation of the causal pathway assumed in this paper, with treatment $T$, measured confounders $X$, unmeasured confounder $U$, latent factors $L$, and multivariate outcome $Y$. The three scenarios consider different relationships for unmeasured confounder $U$: (1) confounding only through an effect on latent factors $L$, (2) confounding as an effect of measured confounders $X$ on $U$, (3) confounding because $U$ is a cause of measured confounders $X$. The simulation study investigates each of these scenarios.
  • Figure 2: Bias (left) and mean square error (MSE, right) across simulated scenarios. Results are shown for our proposed model, standard factor model, causal BART, and BCF.
  • Figure 3: Estimated causal effects from four models: our proposed causal factor model, factor model with standard Gaussian prior for factor scores, BART, and BCF. Dots show the median of causal effects; lines show the corresponding $90\%$ credible interval.
  • Figure 4: Treatment-specific factor loadings: (left) without wildfire smoke and (right) with wildfire smoke. Colors of the chemical species names indicate chemical groupings.
  • Figure 5: Bias (left) and mean square error (MSE, right) across simulated scenarios. Results are shown for our proposed model, standard factor model, causal BART, and BCF.
  • ...and 3 more figures