Table of Contents
Fetching ...

Triamese-ViT: A 3D-Aware Method for Robust Brain Age Estimation from MRIs

Zhaonian Zhang, Richard Jiang

TL;DR

The paper tackles brain age estimation from MRI by addressing the limitation of existing 2DViT and 3D CNN approaches in capturing 3D context and providing interpretable outputs. It introduces Triamese-ViT, a three-view ViT framework that processes MRI data from three orthogonal orientations and fuses per-view predictions via a Triamese MLP, yielding state-of-the-art accuracy on 1351 healthy scans (MAE ≈ 3.87, r ≈ 0.93) with reduced age bias (BAG correlation ≈ -0.29). The method additionally delivers 3D-like attention maps and validates interpretability through occlusion analysis, aligning results with anatomical knowledge of key regions such as Basal Ganglia, Thalamus, and Midbrain. Overall, Triamese-ViT advances brain age estimation by combining multi-view Transformer analysis with interpretable outputs, showing potential for clinical deployment and broader medical AI research.

Abstract

The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recent studies emphasizing the efficacy of 3D convolutional neural networks (CNNs) like 3D ResNet. However, the untapped potential of Vision Transformers (ViTs), known for their accuracy and interpretability, persists in this domain due to limitations in their 3D versions. This paper introduces Triamese-ViT, an innovative adaptation of the ViT model for brain age estimation. Our model uniquely combines ViTs from three different orientations to capture 3D information, significantly enhancing accuracy and interpretability. Tested on a dataset of 1351 MRI scans, Triamese-ViT achieves a Mean Absolute Error (MAE) of 3.84, a 0.9 Spearman correlation coefficient with chronological age, and a -0.29 Spearman correlation coefficient between the brain age gap (BAG) and chronological age, significantly better than previous methods for brian age estimation. A key innovation of Triamese-ViT is its capacity to generate a comprehensive 3D-like attention map, synthesized from 2D attention maps of each orientation-specific ViT. This feature is particularly beneficial for in-depth brain age analysis and disease diagnosis, offering deeper insights into brain health and the mechanisms of age-related neural changes.

Triamese-ViT: A 3D-Aware Method for Robust Brain Age Estimation from MRIs

TL;DR

The paper tackles brain age estimation from MRI by addressing the limitation of existing 2DViT and 3D CNN approaches in capturing 3D context and providing interpretable outputs. It introduces Triamese-ViT, a three-view ViT framework that processes MRI data from three orthogonal orientations and fuses per-view predictions via a Triamese MLP, yielding state-of-the-art accuracy on 1351 healthy scans (MAE ≈ 3.87, r ≈ 0.93) with reduced age bias (BAG correlation ≈ -0.29). The method additionally delivers 3D-like attention maps and validates interpretability through occlusion analysis, aligning results with anatomical knowledge of key regions such as Basal Ganglia, Thalamus, and Midbrain. Overall, Triamese-ViT advances brain age estimation by combining multi-view Transformer analysis with interpretable outputs, showing potential for clinical deployment and broader medical AI research.

Abstract

The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recent studies emphasizing the efficacy of 3D convolutional neural networks (CNNs) like 3D ResNet. However, the untapped potential of Vision Transformers (ViTs), known for their accuracy and interpretability, persists in this domain due to limitations in their 3D versions. This paper introduces Triamese-ViT, an innovative adaptation of the ViT model for brain age estimation. Our model uniquely combines ViTs from three different orientations to capture 3D information, significantly enhancing accuracy and interpretability. Tested on a dataset of 1351 MRI scans, Triamese-ViT achieves a Mean Absolute Error (MAE) of 3.84, a 0.9 Spearman correlation coefficient with chronological age, and a -0.29 Spearman correlation coefficient between the brain age gap (BAG) and chronological age, significantly better than previous methods for brian age estimation. A key innovation of Triamese-ViT is its capacity to generate a comprehensive 3D-like attention map, synthesized from 2D attention maps of each orientation-specific ViT. This feature is particularly beneficial for in-depth brain age analysis and disease diagnosis, offering deeper insights into brain health and the mechanisms of age-related neural changes.
Paper Structure (14 sections, 12 equations, 5 figures, 2 tables)

This paper contains 14 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The structure of Triamese-ViT. We reshape MRI scans into three distinct viewpoints, dividing each into fixed-size patches. These patches are then linearly embedded, enhanced with position embeddings, and subsequently inputted into a standard Transformer encoder. The encoder's output is directed through MLP Heads to generate three separate predictions. These predictions are then integrated using the Triamese MLP, culminating in the final result.
  • Figure 2: Illustration of the framework for occlusion analysis.
  • Figure 3: The impact of the number of MLP layers in Triamese-Encoder
  • Figure 4: The impact of the backbone architectures
  • Figure 5: Comparison between the attention map and occlusion analysis from Triamese-ViT. The upper half showcases the results from the occlusion analysis, while the lower half displays the results from the attention map. Both halves collectively highlight the specific regions of the brain that the Triamese-ViT model prioritizes and considers most informative for determining age.