Triamese-ViT: A 3D-Aware Method for Robust Brain Age Estimation from MRIs
Zhaonian Zhang, Richard Jiang
TL;DR
The paper tackles brain age estimation from MRI by addressing the limitation of existing 2DViT and 3D CNN approaches in capturing 3D context and providing interpretable outputs. It introduces Triamese-ViT, a three-view ViT framework that processes MRI data from three orthogonal orientations and fuses per-view predictions via a Triamese MLP, yielding state-of-the-art accuracy on 1351 healthy scans (MAE ≈ 3.87, r ≈ 0.93) with reduced age bias (BAG correlation ≈ -0.29). The method additionally delivers 3D-like attention maps and validates interpretability through occlusion analysis, aligning results with anatomical knowledge of key regions such as Basal Ganglia, Thalamus, and Midbrain. Overall, Triamese-ViT advances brain age estimation by combining multi-view Transformer analysis with interpretable outputs, showing potential for clinical deployment and broader medical AI research.
Abstract
The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recent studies emphasizing the efficacy of 3D convolutional neural networks (CNNs) like 3D ResNet. However, the untapped potential of Vision Transformers (ViTs), known for their accuracy and interpretability, persists in this domain due to limitations in their 3D versions. This paper introduces Triamese-ViT, an innovative adaptation of the ViT model for brain age estimation. Our model uniquely combines ViTs from three different orientations to capture 3D information, significantly enhancing accuracy and interpretability. Tested on a dataset of 1351 MRI scans, Triamese-ViT achieves a Mean Absolute Error (MAE) of 3.84, a 0.9 Spearman correlation coefficient with chronological age, and a -0.29 Spearman correlation coefficient between the brain age gap (BAG) and chronological age, significantly better than previous methods for brian age estimation. A key innovation of Triamese-ViT is its capacity to generate a comprehensive 3D-like attention map, synthesized from 2D attention maps of each orientation-specific ViT. This feature is particularly beneficial for in-depth brain age analysis and disease diagnosis, offering deeper insights into brain health and the mechanisms of age-related neural changes.
