BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI

Wasif Jalal; Md Nafiu Rahman; Atif Hasan Rahman; M. Sohel Rahman

BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI

Wasif Jalal, Md Nafiu Rahman, Atif Hasan Rahman, M. Sohel Rahman

TL;DR

BrainRotViT tackles the challenge of accurate, generalizable brain age estimation from heterogeneous multi-site sMRI by coupling a Vision Transformer encoder, pretrained on age–sex composite classes, with a lightweight residual CNN regression head that operates on a 2D pseudo-image formed from ViT embeddings. The model achieves strong validation performance ($\text{MAE}=\$3.34$ years, $r=0.98$, $\rho=0.97$, $R^2=0.95$) across 11 datasets and demonstrates robust cross-cohort generalization ($\text{MAE}$ between $3.77$ and $5.04$) on four independent cohorts. Interpretability is integrated via guided backpropagation and ViT patch mapping to produce slice-level and 3D attention volumes, highlighting aging-relevant regions such as the cerebellar vermis, precentral/postcentral gyri, temporal lobes, and medial superior frontal gyrus. The findings link brain age gaps to neurological conditions (e.g., AD, MCI, ASD), offering a scalable, efficient, and explainable framework that bridges CNN- and transformer-based approaches for aging and neurodegeneration research.

Abstract

Accurate brain age estimation from structural MRI is a valuable biomarker for studying aging and neurodegeneration. Traditional regression and CNN-based methods face limitations such as manual feature engineering, limited receptive fields, and overfitting on heterogeneous data. Pure transformer models, while effective, require large datasets and high computational cost. We propose Brain ResNet over trained Vision Transformer (BrainRotViT), a hybrid architecture that combines the global context modeling of vision transformers (ViT) with the local refinement of residual CNNs. A ViT encoder is first trained on an auxiliary age and sex classification task to learn slice-level features. The frozen encoder is then applied to all sagittal slices to generate a 2D matrix of embedding vectors, which is fed into a residual CNN regressor that incorporates subject sex at the final fully-connected layer to estimate continuous brain age. Our method achieves an MAE of 3.34 years (Pearson $r=0.98$, Spearman $ρ=0.97$, $R^2=0.95$) on validation across 11 MRI datasets encompassing more than 130 acquisition sites, outperforming baseline and state-of-the-art models. It also generalizes well across 4 independent cohorts with MAEs between 3.77 and 5.04 years. Analyses on the brain age gap (the difference between the predicted age and actual age) show that aging patterns are associated with Alzheimer's disease, cognitive impairment, and autism spectrum disorder. Model attention maps highlight aging-associated regions of the brain, notably the cerebellar vermis, precentral and postcentral gyri, temporal lobes, and medial superior frontal gyrus. Our results demonstrate that this method provides an efficient, interpretable, and generalizable framework for brain-age prediction, bridging the gap between CNN- and transformer-based approaches while opening new avenues for aging and neurodegeneration research.

BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI

TL;DR

3.34

r=0.98

\rho=0.97

R^2=0.95

\text{MAE}

3.77

5.04$) on four independent cohorts. Interpretability is integrated via guided backpropagation and ViT patch mapping to produce slice-level and 3D attention volumes, highlighting aging-relevant regions such as the cerebellar vermis, precentral/postcentral gyri, temporal lobes, and medial superior frontal gyrus. The findings link brain age gaps to neurological conditions (e.g., AD, MCI, ASD), offering a scalable, efficient, and explainable framework that bridges CNN- and transformer-based approaches for aging and neurodegeneration research.

Abstract

, Spearman

) on validation across 11 MRI datasets encompassing more than 130 acquisition sites, outperforming baseline and state-of-the-art models. It also generalizes well across 4 independent cohorts with MAEs between 3.77 and 5.04 years. Analyses on the brain age gap (the difference between the predicted age and actual age) show that aging patterns are associated with Alzheimer's disease, cognitive impairment, and autism spectrum disorder. Model attention maps highlight aging-associated regions of the brain, notably the cerebellar vermis, precentral and postcentral gyri, temporal lobes, and medial superior frontal gyrus. Our results demonstrate that this method provides an efficient, interpretable, and generalizable framework for brain-age prediction, bridging the gap between CNN- and transformer-based approaches while opening new avenues for aging and neurodegeneration research.

BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI

TL;DR

Abstract

BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)