Table of Contents
Fetching ...

Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD

Byunggun Kim, Younghun Kwon

TL;DR

This study tackles ADHD diagnosis from resting-state fMRI by learning full spatiotemporal biomarkers with an encoder–decoder transformer. It introduces three targeted innovations: a CNN-based embedding block for spatial features, local temporal attention to capture short-range BOLD dynamics, and ROI-rank masking to focus on the most ADHD-relevant regions, all within a spatiotemporal co-attention framework. Evaluated on ADHD-200 data across sites, the approach achieves about 77.8% ACC and 79.3% AUC, outperforming several CNN- and transformer-based baselines and demonstrating robustness to ROI templates. The work provides both improved diagnostic performance and interpretable biomarker patterns, advancing cross-site ADHD diagnosis and biomarker discovery from rs-fMRI.

Abstract

In modern society, Attention-Deficit/Hyperactivity Disorder (ADHD) is one of the common mental diseases discovered not only in children but also in adults. In this context, we propose a ADHD diagnosis transformer model that can effectively simultaneously find important brain spatiotemporal biomarkers from resting-state functional magnetic resonance (rs-fMRI). This model not only learns spatiotemporal individual features but also learns the correlation with full attention structures specialized in ADHD diagnosis. In particular, it focuses on learning local blood oxygenation level dependent (BOLD) signals and distinguishing important regions of interest (ROI) in the brain. Specifically, the three proposed methods for ADHD diagnosis transformer are as follows. First, we design a CNN-based embedding block to obtain more expressive embedding features in brain region attention. It is reconstructed based on the previously CNN-based ADHD diagnosis models for the transformer. Next, for individual spatiotemporal feature attention, we change the attention method to local temporal attention and ROI-rank based masking. For the temporal features of fMRI, the local temporal attention enables to learn local BOLD signal features with only simple window masking. For the spatial feature of fMRI, ROI-rank based masking can distinguish ROIs with high correlation in ROI relationships based on attention scores, thereby providing a more specific biomarker for ADHD diagnosis. The experiment was conducted with various types of transformer models. To evaluate these models, we collected the data from 939 individuals from all sites provided by the ADHD-200 competition. Through this, the spatiotemporal enhanced transformer for ADHD diagnosis outperforms the performance of other different types of transformer variants. (77.78ACC 76.60SPE 79.22SEN 79.30AUC)

Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD

TL;DR

This study tackles ADHD diagnosis from resting-state fMRI by learning full spatiotemporal biomarkers with an encoder–decoder transformer. It introduces three targeted innovations: a CNN-based embedding block for spatial features, local temporal attention to capture short-range BOLD dynamics, and ROI-rank masking to focus on the most ADHD-relevant regions, all within a spatiotemporal co-attention framework. Evaluated on ADHD-200 data across sites, the approach achieves about 77.8% ACC and 79.3% AUC, outperforming several CNN- and transformer-based baselines and demonstrating robustness to ROI templates. The work provides both improved diagnostic performance and interpretable biomarker patterns, advancing cross-site ADHD diagnosis and biomarker discovery from rs-fMRI.

Abstract

In modern society, Attention-Deficit/Hyperactivity Disorder (ADHD) is one of the common mental diseases discovered not only in children but also in adults. In this context, we propose a ADHD diagnosis transformer model that can effectively simultaneously find important brain spatiotemporal biomarkers from resting-state functional magnetic resonance (rs-fMRI). This model not only learns spatiotemporal individual features but also learns the correlation with full attention structures specialized in ADHD diagnosis. In particular, it focuses on learning local blood oxygenation level dependent (BOLD) signals and distinguishing important regions of interest (ROI) in the brain. Specifically, the three proposed methods for ADHD diagnosis transformer are as follows. First, we design a CNN-based embedding block to obtain more expressive embedding features in brain region attention. It is reconstructed based on the previously CNN-based ADHD diagnosis models for the transformer. Next, for individual spatiotemporal feature attention, we change the attention method to local temporal attention and ROI-rank based masking. For the temporal features of fMRI, the local temporal attention enables to learn local BOLD signal features with only simple window masking. For the spatial feature of fMRI, ROI-rank based masking can distinguish ROIs with high correlation in ROI relationships based on attention scores, thereby providing a more specific biomarker for ADHD diagnosis. The experiment was conducted with various types of transformer models. To evaluate these models, we collected the data from 939 individuals from all sites provided by the ADHD-200 competition. Through this, the spatiotemporal enhanced transformer for ADHD diagnosis outperforms the performance of other different types of transformer variants. (77.78ACC 76.60SPE 79.22SEN 79.30AUC)

Paper Structure

This paper contains 21 sections, 9 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The enhanced ADHD diagnosis transformer model architecture. It modified three modules (CNN-based embedding block, local temporal attention, and ROI-rank based masking) from baseline transformer.
  • Figure 2: The structure of the feature-independent self-attention (left) and spatiotemporal co-attention (right). The feature-independent self-attention (left), same as original self-attention, can focus on in each perspective feature learning. Whereas the spatiotemporal co-attention (right) learns how to relate between spatial and temporal features in rs-fMRI.
  • Figure 3: The three different type of transformer block. The (left) is the original transformer block. The (middle) is the local temporal attention-based transformer block. The (right) is the ROI-Rank masking attention-based transformer block.
  • Figure 4: The detail of CNN-based Embedding blocks. We change the last CNN block with 2 convolutional layers with GeLU activation.
  • Figure 5: The temporal local attention’s masking method in score matrix. Specifically, the only blue grid’s query-key scores that neighbor with the diagonal element is used to attend. And the other gray grid’s query-key scores are ignored.
  • ...and 5 more figures