Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures
Danlu Zhang, Stephanie M. Eick, Howard H. Chang
TL;DR
This study tackles the computational bottleneck of Bayesian Kernel Machine Regression (BKMR) for jointly analyzing multiple environmental exposures by introducing Fast BKMR, which replaces Gaussian process random effects with supervised random Fourier features to yield a linear mixed-effects formulation suitable for Hamiltonian Monte Carlo. The approach substantially reduces computation time while preserving accuracy, performing especially well when the exposure–response surface exhibits strong dependency, and it can handle large datasets similarly to big administrative health databases. In simulations, Fast BKMR matches BKMR in estimation quality and often outperforms BKMR with predictive process in speed, with linear scaling in sample size and basis-function count. Applied to over 270,000 Georgia birth records, Fast BKMR uncovers nonlinear, interacting effects of NO$_2$, CO, and PM$_{2.5}$ on birthweight, highlighting larger reductions when pollutants interact and confirming known adverse associations for NO$_2$ and PM$_{2.5}$ while suggesting CO’s effect may be less clear. Overall, Fast BKMR provides a scalable, flexible tool for assessing joint health effects of multiple ambient exposures with practical impact for large epidemiological studies.
Abstract
Environmental epidemiology has traditionally examined single exposure one at a time. Advances in exposure assessment and statistical methods now enable studies of multiple exposures and their combined health impacts. Bayesian Kernel Machine Regression (BKMR) is a widely used approach to flexibly estimates joint, nonlinear effects of multiple exposures. But BMKR is computationally intensive for large datasets, as repeated kernel inversion in Markov chain Monte Carlo (MCMC) can be time-consuming and often infeasible in practice. To address this issue, we propose using supervised random Fourier basis functions to replace the Gaussian process random effects. This re-frames the kernel machine regression into a linear mixed-effect model that facilitates computationally efficient estimation and prediction. Bayesian inference is conducted using MCMC with Hamiltonian Monte Carlo algorithms. Simulation studies demonstrate that our method yields results comparable to BKMR while significantly reduces the computation time. Our approach outperforms BKMR when the exposure-response surface has stronger dependency and when using predictive process as an alternative approximation method. Finally, we applied this approach to analyze over 270,000 birth records, examining associations between multiple ambient air pollutants and birthweight in Georgia.
