Table of Contents
Fetching ...

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

Yitong Li, Morteza Ghahremani, Youssef Wally, Christian Wachinger

TL;DR

DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance.

Abstract

Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study. The code is available at https://github.com/ai-med/DiaMond.

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

TL;DR

DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance.

Abstract

Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study. The code is available at https://github.com/ai-med/DiaMond.

Paper Structure

This paper contains 19 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: MRI and PET are two modalities with disease-specific dependency. We introduce a novel framework including self-attention mechanism with multi-modal normalization to capture distinct features from single modalities, and a novel bi-attention mechanism to exclusively extract their similarities.
  • Figure 2: DiaMond encodes PET and MRI individually with blocks $\mathcal{F}_{P}$ and $\mathcal{F}_{M}$, then applies RegBN to ensure their independence. It leverages a novel bi-attention mechanism on the multi-modal features in block $\mathcal{F}_{M,P}$ to focus on correlations among them. Finally, the latent features from all three blocks are aggregated into an MLP head to obtain classification labels $\mathcal{C}$.
  • Figure 3: Bi-attention block in branch $\mathcal{F}_{{M,P}}$ calculates the interwaved attention between input modalities, with a constant threshold $\tau$ to filter out very small values in the correlation matrices.
  • Figure 4: Ablation on the attention threshold $\tau$. Results are on the classification of CN vs. AD using the ADNI dataset.