Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis
Yuliang Xu, Timothy D. Johnson, Thomas E. Nichols, Jian Kang
TL;DR
The paper tackles scalable, uncertainty‑aware analysis of population‑scale fMRI via Bayesian Image‑on‑Scalar Regression (ISR) that accommodates subject‑specific masks. It introduces SBIOS, combining Gaussian Process priors with a voxelwise inclusion indicator and a memory‑mapped, mini‑batch SGLD posterior sampler to achieve linear scaling in batch size and direct spatial inference. On the UK Biobank dataset ($n=38{,}639$, $p>10^5$ voxels, $R=110$ regions), SBIOS demonstrates $4$–$11$× speedups and $8$–$18$ extpercent power gains over Gibbs sampling with zero imputation, and identifies an amygdala subregion where emotion‑related activation declines by about $58$ extpercent between ages $50$ and $60$. These advances enable reliable, voxel‑level activation inferences in large‑scale neuroimaging, leveraging subject‑specific masks through imputation and providing principled uncertainty quantification via posterior inclusion probabilities.
Abstract
Bayesian Image-on-Scalar Regression (ISR) provides flexible, uncertainty-aware neuroimaging analysis. However, applying ISR to large-scale datasets such as the UK Biobank is challenging due to intensive computational demands and the need to handle subject-specific brain masks rather than a common mask. We propose a novel Bayesian ISR model that scales efficiently while accommodating these inconsistent masks. Our method leverages Gaussian process priors with salience area indicators and introduces a scalable posterior computation algorithm using stochastic gradient Langevin dynamics combined with memory mapping. This approach achieves linear scaling with subsample size and constrains memory usage to the batch size, facilitating direct spatial posterior inferences on brain activation regions. Simulation studies and analysis of UK Biobank task fMRI data (38,639 subjects; over 120,000 voxels per image) demonstrate a 4- to 11-fold speed increase and an 8-18% enhancement in statistical power compared to traditional Gibbs sampling with zero-imputation. Our analysis reveals a subregion of the amygdala where emotion-related brain activation decreases by approximately 58% between ages 50 and 60.
