Table of Contents
Fetching ...

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali

TL;DR

The paper tackles automated MDD detection from structural MRI by introducing 3DViT-GAT, a unified framework that extracts region embeddings via a 3D Vision Transformer and reasons over inter-regional relations with a Graph Attention Network. It compares atlas-based versus cube-based region extraction and demonstrates that anatomically-informed ROI partitions yield more robust and accurate results on the REST-meta-MDD dataset, with the Dose atlas in particular achieving state-of-the-art performance among proposed models. The approach combines region-level ViT representations with cosine-similarity graphs to encode intra- and inter-regional dependencies, achieving an accuracy of up to $81.51\%$ on the best fold and high sensitivity, which is valuable for clinical screening. The work advances MDD neuroimaging analysis by integrating anatomical priors with transformer-based embeddings and graph learning, and it highlights the interpretability of results via GNNExplainer mapping of salient ROIs to known brain networks.

Abstract

Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health. Automated detection of MDD using structural magnetic resonance imaging (sMRI) and deep learning (DL) methods holds increasing promise for improving diagnostic accuracy and enabling early intervention. Most existing methods employ either voxel-level features or handcrafted regional representations built from predefined brain atlases, limiting their ability to capture complex brain patterns. This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification. We explore two strategies for defining regions: (1) an atlas-based approach using predefined structural and functional brain atlases, and (2) an cube-based method by which ViTs are trained directly to identify regions from uniformly extracted 3D patches. Further, cosine similarity graphs are generated to model interregional relationships, and guide GNN-based classification. Extensive experiments were conducted using the REST-meta-MDD dataset to demonstrate the effectiveness of our model. With stratified 10-fold cross-validation, the best model obtained 81.51\% accuracy, 85.94\% sensitivity, 76.36\% specificity, 80.88\% precision, and 83.33\% F1-score. Further, atlas-based models consistently outperformed the cube-based approach, highlighting the importance of using domain-specific anatomical priors for MDD detection.

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

TL;DR

The paper tackles automated MDD detection from structural MRI by introducing 3DViT-GAT, a unified framework that extracts region embeddings via a 3D Vision Transformer and reasons over inter-regional relations with a Graph Attention Network. It compares atlas-based versus cube-based region extraction and demonstrates that anatomically-informed ROI partitions yield more robust and accurate results on the REST-meta-MDD dataset, with the Dose atlas in particular achieving state-of-the-art performance among proposed models. The approach combines region-level ViT representations with cosine-similarity graphs to encode intra- and inter-regional dependencies, achieving an accuracy of up to on the best fold and high sensitivity, which is valuable for clinical screening. The work advances MDD neuroimaging analysis by integrating anatomical priors with transformer-based embeddings and graph learning, and it highlights the interpretability of results via GNNExplainer mapping of salient ROIs to known brain networks.

Abstract

Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health. Automated detection of MDD using structural magnetic resonance imaging (sMRI) and deep learning (DL) methods holds increasing promise for improving diagnostic accuracy and enabling early intervention. Most existing methods employ either voxel-level features or handcrafted regional representations built from predefined brain atlases, limiting their ability to capture complex brain patterns. This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification. We explore two strategies for defining regions: (1) an atlas-based approach using predefined structural and functional brain atlases, and (2) an cube-based method by which ViTs are trained directly to identify regions from uniformly extracted 3D patches. Further, cosine similarity graphs are generated to model interregional relationships, and guide GNN-based classification. Extensive experiments were conducted using the REST-meta-MDD dataset to demonstrate the effectiveness of our model. With stratified 10-fold cross-validation, the best model obtained 81.51\% accuracy, 85.94\% sensitivity, 76.36\% specificity, 80.88\% precision, and 83.33\% F1-score. Further, atlas-based models consistently outperformed the cube-based approach, highlighting the importance of using domain-specific anatomical priors for MDD detection.

Paper Structure

This paper contains 21 sections, 17 equations, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: An overview of the proposed 3DViT-GAT model. sMRI, structural MRI; ViT, vision transformer; MLP, multilayer perceptron; GAT, graph attention network; LeakyRelu, leaky rectified linear unit; GAP, global average pooling; FCN, fully connected network; MDD, major depressive disorder; HC, healthy control; ROI, brain regions of interest; N, the number of ROIs.
  • Figure 2: Explainability maps of the Dose-based model generated by GNNExplainer. a) Salient ROIs associated with MDD; b) Salient ROIs associated with HC; and c) Salient ROIs associated with MDD vs. HC. Each bubble represents an ROI, whose size and color intensity reflect its relative contribution score. Abbreviations: Dose, dosenbach atlas; ROIs, Brain regions of interest.
  • Figure 3: Explainability maps of the AAL-based model generated by GNNExplainer. a) Salient ROIs associated with MDD; b) Salient ROIs associated with HC; and c) Salient ROIs associated with MDD vs. HC. Each bubble represents an ROI, whose size and color intensity reflect its relative contribution score. Abbreviations: AAL, Automated Anatomical Labeling atlas; ROIs, Brain regions of interest.