Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

Yuliang Xu; Timothy D. Johnson; Thomas E. Nichols; Jian Kang

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

Yuliang Xu, Timothy D. Johnson, Thomas E. Nichols, Jian Kang

TL;DR

The paper tackles scalable, uncertainty‑aware analysis of population‑scale fMRI via Bayesian Image‑on‑Scalar Regression (ISR) that accommodates subject‑specific masks. It introduces SBIOS, combining Gaussian Process priors with a voxelwise inclusion indicator and a memory‑mapped, mini‑batch SGLD posterior sampler to achieve linear scaling in batch size and direct spatial inference. On the UK Biobank dataset ($n=38{,}639$, $p>10^5$ voxels, $R=110$ regions), SBIOS demonstrates $4$–$11$× speedups and $8$–$18$ extpercent power gains over Gibbs sampling with zero imputation, and identifies an amygdala subregion where emotion‑related activation declines by about $58$ extpercent between ages $50$ and $60$. These advances enable reliable, voxel‑level activation inferences in large‑scale neuroimaging, leveraging subject‑specific masks through imputation and providing principled uncertainty quantification via posterior inclusion probabilities.

Abstract

Bayesian Image-on-Scalar Regression (ISR) provides flexible, uncertainty-aware neuroimaging analysis. However, applying ISR to large-scale datasets such as the UK Biobank is challenging due to intensive computational demands and the need to handle subject-specific brain masks rather than a common mask. We propose a novel Bayesian ISR model that scales efficiently while accommodating these inconsistent masks. Our method leverages Gaussian process priors with salience area indicators and introduces a scalable posterior computation algorithm using stochastic gradient Langevin dynamics combined with memory mapping. This approach achieves linear scaling with subsample size and constrains memory usage to the batch size, facilitating direct spatial posterior inferences on brain activation regions. Simulation studies and analysis of UK Biobank task fMRI data (38,639 subjects; over 120,000 voxels per image) demonstrate a 4- to 11-fold speed increase and an 8-18% enhancement in statistical power compared to traditional Gibbs sampling with zero-imputation. Our analysis reveals a subregion of the amygdala where emotion-related brain activation decreases by approximately 58% between ages 50 and 60.

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

TL;DR

voxels,

regions), SBIOS demonstrates

–

× speedups and

–

extpercent power gains over Gibbs sampling with zero imputation, and identifies an amygdala subregion where emotion‑related activation declines by about

extpercent between ages

and

. These advances enable reliable, voxel‑level activation inferences in large‑scale neuroimaging, leveraging subject‑specific masks through imputation and providing principled uncertainty quantification via posterior inclusion probabilities.

Abstract

Paper Structure (36 sections, 18 equations, 20 figures, 6 tables, 2 algorithms)

This paper contains 36 sections, 18 equations, 20 figures, 6 tables, 2 algorithms.

Introduction
UK Biobank Data
Traditional and Recent Practices in ISR
Subject-specific Masks in Brain Imaging
Scalable Posterior Algorithms
Model
Posterior Computation
Posterior Sampling with Gaussian Process Priors
Scalable Algorithm for Large Dataset
Evaluation Criteria
Ablation Study Design
UK Biobank Application
Data Preprocessing and Estimation Procedure
Analysis Results
Age-Related Emotion Recognition Brain Activation Patterns
...and 21 more sections

Figures (20)

Figure 1: Incremental Differences of BIOS, SBIOS0, and SBIOSimp.
Figure 2: Analysis mask using an observed proportion threshold of 0.5 and an intersection mask (completely observed data). The purple area indicates 100% inclusion; the blue area indicates the mask with an observed proportion between 0.5 and 1.0.
Figure 3: Illustration of age-related activation patterns using a grayscale brain background image (ch2bet, holmes1998enhancement). Images are created using MRIcron rorden2000stereotaxic.
Figure 4: Illustrations on the amygdala region
Figure 5: Scatter plot of the posterior mean of $\beta(s_j)I(\text{PIP($s_j$)}\geq 0.95)$ based on SBIOS0 (x-axis) and SBIOSimp (y-axis) on six selected regions with high missingness. Blue dots indicate voxels with observed proportion $h(s_j)\in [0.5,0.7)$. Red dots indicate voxels with observed proportion $h(s_j)\in [0.7,0.9)$. Black dots indicate voxels with observed proportion $h(s_j)\in [0.9,1]$.
...and 15 more figures

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

TL;DR

Abstract

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (20)