Table of Contents
Fetching ...

Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures

Danlu Zhang, Stephanie M. Eick, Howard H. Chang

TL;DR

This study tackles the computational bottleneck of Bayesian Kernel Machine Regression (BKMR) for jointly analyzing multiple environmental exposures by introducing Fast BKMR, which replaces Gaussian process random effects with supervised random Fourier features to yield a linear mixed-effects formulation suitable for Hamiltonian Monte Carlo. The approach substantially reduces computation time while preserving accuracy, performing especially well when the exposure–response surface exhibits strong dependency, and it can handle large datasets similarly to big administrative health databases. In simulations, Fast BKMR matches BKMR in estimation quality and often outperforms BKMR with predictive process in speed, with linear scaling in sample size and basis-function count. Applied to over 270,000 Georgia birth records, Fast BKMR uncovers nonlinear, interacting effects of NO$_2$, CO, and PM$_{2.5}$ on birthweight, highlighting larger reductions when pollutants interact and confirming known adverse associations for NO$_2$ and PM$_{2.5}$ while suggesting CO’s effect may be less clear. Overall, Fast BKMR provides a scalable, flexible tool for assessing joint health effects of multiple ambient exposures with practical impact for large epidemiological studies.

Abstract

Environmental epidemiology has traditionally examined single exposure one at a time. Advances in exposure assessment and statistical methods now enable studies of multiple exposures and their combined health impacts. Bayesian Kernel Machine Regression (BKMR) is a widely used approach to flexibly estimates joint, nonlinear effects of multiple exposures. But BMKR is computationally intensive for large datasets, as repeated kernel inversion in Markov chain Monte Carlo (MCMC) can be time-consuming and often infeasible in practice. To address this issue, we propose using supervised random Fourier basis functions to replace the Gaussian process random effects. This re-frames the kernel machine regression into a linear mixed-effect model that facilitates computationally efficient estimation and prediction. Bayesian inference is conducted using MCMC with Hamiltonian Monte Carlo algorithms. Simulation studies demonstrate that our method yields results comparable to BKMR while significantly reduces the computation time. Our approach outperforms BKMR when the exposure-response surface has stronger dependency and when using predictive process as an alternative approximation method. Finally, we applied this approach to analyze over 270,000 birth records, examining associations between multiple ambient air pollutants and birthweight in Georgia.

Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures

TL;DR

This study tackles the computational bottleneck of Bayesian Kernel Machine Regression (BKMR) for jointly analyzing multiple environmental exposures by introducing Fast BKMR, which replaces Gaussian process random effects with supervised random Fourier features to yield a linear mixed-effects formulation suitable for Hamiltonian Monte Carlo. The approach substantially reduces computation time while preserving accuracy, performing especially well when the exposure–response surface exhibits strong dependency, and it can handle large datasets similarly to big administrative health databases. In simulations, Fast BKMR matches BKMR in estimation quality and often outperforms BKMR with predictive process in speed, with linear scaling in sample size and basis-function count. Applied to over 270,000 Georgia birth records, Fast BKMR uncovers nonlinear, interacting effects of NO, CO, and PM on birthweight, highlighting larger reductions when pollutants interact and confirming known adverse associations for NO and PM while suggesting CO’s effect may be less clear. Overall, Fast BKMR provides a scalable, flexible tool for assessing joint health effects of multiple ambient exposures with practical impact for large epidemiological studies.

Abstract

Environmental epidemiology has traditionally examined single exposure one at a time. Advances in exposure assessment and statistical methods now enable studies of multiple exposures and their combined health impacts. Bayesian Kernel Machine Regression (BKMR) is a widely used approach to flexibly estimates joint, nonlinear effects of multiple exposures. But BMKR is computationally intensive for large datasets, as repeated kernel inversion in Markov chain Monte Carlo (MCMC) can be time-consuming and often infeasible in practice. To address this issue, we propose using supervised random Fourier basis functions to replace the Gaussian process random effects. This re-frames the kernel machine regression into a linear mixed-effect model that facilitates computationally efficient estimation and prediction. Bayesian inference is conducted using MCMC with Hamiltonian Monte Carlo algorithms. Simulation studies demonstrate that our method yields results comparable to BKMR while significantly reduces the computation time. Our approach outperforms BKMR when the exposure-response surface has stronger dependency and when using predictive process as an alternative approximation method. Finally, we applied this approach to analyze over 270,000 birth records, examining associations between multiple ambient air pollutants and birthweight in Georgia.

Paper Structure

This paper contains 16 sections, 5 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The RMSE between true and estimated joint effects with different correlation type (CorType) and number of exposures ($M$) at sample size of 1000.
  • Figure 2: The RMSE between true and estimated joint effects under parametric scenario with different sample size ($N$) and number of exposures ($M$).
  • Figure 3: The overall effects of all exposures on the birthweight compared to the 25$^{\text{th}}$ percentiles of exposures with 20 basis functions. Blue-shaded areas are the 95% posterior intervals.
  • Figure 4: The response plots for single pollutant when fixing other two pollutants at $10^{th}, 50^{th}$ and $90^{th}$ percentiles.
  • Figure 5: A. Bi-pollutant exposure-response surface for standardized PM$_{2.5}$ and NO$_2$ when fixing the CO at $50^{th}$ percentiles. Vertical dashed lines are $25^{th}, 50^{th} ~\text{and}~ 75^{th}$ percentiles of standardized PM$_{2.5}$ (from left to right). Horizontal dashed lines are $25^{th}, 50^{th} ~\text{and}~ 75^{th}$ percentiles of standardized NO$_2$ (from bottom to top). B. Bi-pollutant exposure-response functions for standardized PM$_{2.5}$ at $25^{th}, 50^{th} ~\text{and}~ 75^{th}$ percentiles of the NO$_2$, with CO fixed at $50^{th}$ percentiles. C. Bi-pollutant exposure-response functions of standardized NO$_2$ with $25^{th}, 50^{th} ~\text{and}~ 75^{th}$ percentiles of the PM$_{2.5}$ with CO fixed at $50^{th}$ percentiles.