Table of Contents
Fetching ...

DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang

TL;DR

This work proposes a dual-stream pipeline that facilitates cross-task and cross-architecture knowledge sharing that unifies feature representations from segmentation and classification models, enabling dimensional integration of these features to guide the classification model.

Abstract

In the field of Alzheimer's disease diagnosis, segmentation and classification tasks are inherently interconnected. Sharing knowledge between models for these tasks can significantly improve training efficiency, particularly when training data is scarce. However, traditional knowledge distillation techniques often struggle to bridge the gap between segmentation and classification due to the distinct nature of tasks and different model architectures. To address this challenge, we propose a dual-stream pipeline that facilitates cross-task and cross-architecture knowledge sharing. Our approach introduces a dual-stream embedding module that unifies feature representations from segmentation and classification models, enabling dimensional integration of these features to guide the classification model. We validated our method on multiple 3D datasets for Alzheimer's disease diagnosis, demonstrating significant improvements in classification performance, especially on small datasets. Furthermore, we extended our pipeline with a residual temporal attention mechanism for early diagnosis, utilizing images taken before the atrophy of patients' brain mass. This advancement shows promise in enabling diagnosis approximately six months earlier in mild and asymptomatic stages, offering critical time for intervention.

DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis

TL;DR

This work proposes a dual-stream pipeline that facilitates cross-task and cross-architecture knowledge sharing that unifies feature representations from segmentation and classification models, enabling dimensional integration of these features to guide the classification model.

Abstract

In the field of Alzheimer's disease diagnosis, segmentation and classification tasks are inherently interconnected. Sharing knowledge between models for these tasks can significantly improve training efficiency, particularly when training data is scarce. However, traditional knowledge distillation techniques often struggle to bridge the gap between segmentation and classification due to the distinct nature of tasks and different model architectures. To address this challenge, we propose a dual-stream pipeline that facilitates cross-task and cross-architecture knowledge sharing. Our approach introduces a dual-stream embedding module that unifies feature representations from segmentation and classification models, enabling dimensional integration of these features to guide the classification model. We validated our method on multiple 3D datasets for Alzheimer's disease diagnosis, demonstrating significant improvements in classification performance, especially on small datasets. Furthermore, we extended our pipeline with a residual temporal attention mechanism for early diagnosis, utilizing images taken before the atrophy of patients' brain mass. This advancement shows promise in enabling diagnosis approximately six months earlier in mild and asymptomatic stages, offering critical time for intervention.
Paper Structure (19 sections, 6 equations, 3 figures, 3 tables)

This paper contains 19 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: MRI scans illustrating the volumetric changes in the cerebral ventricle (highlighted in red) during the transition from Mild Cognitive Impairment (MCI) to Alzheimer’s Disease (AD) and Normal Case (NC). The progression from MCI to AD is characterized by a noticeable increase in ventricle size, while the reverse trend is observed in MCI to NC b3. These structural alterations provide evidence of detectable early-stage biomarkers, supporting the feasibility of AI-driven early diagnosis of Alzheimer’s disease.
  • Figure 2: The process starts with generating segmentation results from 3D MRI images using FastSurfer. These images and their segmentation maps are sliced along three orthogonal planes. Stream 1 processes pixel-based image data, while Stream 2 handles token-like embeddings from the segmentation results. The embedded features from both streams are integrated using separate trainable MLP modules across three dimensions, then concatenated into a comprehensive feature matrix. The matrix serves as the input to the AD diagnosis model, i.e. ADAPT in our setup, which performs feature extraction and diagnosis.
  • Figure 3: The overview of the pipeline for DS-ViT + RTAB for Alzheimer's Early Diagnosis. DS-ViT extracts high-dimensional feature maps from MRI scans at different time points. Residuals are computed between consecutive feature maps and fused with the current map to create residual fusion features. These features are aggregated by the Residual Temporal Attention Block (RTAB) to assess the risk of Alzheimer’s disease progression, outputting “At Risk” or “Safe” classifications for early intervention.