LEAD: An EEG Foundation Model for Alzheimer's Disease Detection
Yihe Wang, Nan Huang, Nadia Mammone, Marco Cecchi, Xiang Zhang
TL;DR
This paper addresses the scarcity and heterogeneity of EEG data for Alzheimer's detection by building LEAD, the first large-scale EEG foundation model trained on the world’s largest EEG-AD corpus (2,238 AD-related subjects) and 13 heterogeneous datasets. LEAD uses a gated temporal-spatial Transformer with univariate patch embeddings, 3D channel embeddings, and sampling-rate embeddings, paired with subject-regularized training and domain-inspired self-supervised pre-training. Across 5 AD downstream datasets and 20 evaluations, LEAD achieves top rankings and outperforms state-of-the-art EEG foundation models, validating the effectiveness of large-scale EEG pre-training and subject-focused learning for clinical detection. The work emphasizes practical deployment potential and provides pre-trained checkpoints to support research on AD and related brain disorders.
Abstract
Electroencephalography (EEG) provides a non-invasive, highly accessible, and cost-effective approach for detecting Alzheimer's disease (AD). However, existing methods, whether based on handcrafted feature engineering or standard deep learning, face three major challenges: 1) the lack of large-scale EEG-based AD datasets for robust representation learning; 2) limited generalizability across subjects; and 3) difficulty in adapting to highly heterogeneous data. To address these challenges, we curate the world's largest EEG-AD corpus to date, comprising 2,238 subjects. Leveraging this unique resource, we propose LEAD, the first large-scale foundation model for EEG-based AD detection. Specifically, we design a gated temporal-spatial Transformer that can adapt to EEG recordings with arbitrary lengths, channel configurations, and sampling rates. In addition, we introduce a subject-regularized training strategy to enhance subject-level feature learning. We further employ medical contrastive learning for pre-training on 13 datasets, including 4 AD datasets and 9 non-AD neurological disorder datasets, and fine-tune/test the model on the other 5 AD datasets. LEAD achieves the best average ranking across all 20 evaluations on 5 downstream datasets, substantially outperforming existing approaches, including state-of-the-art (SOTA) EEG foundation models. These results strongly demonstrate the effectiveness and practical potential of the proposed method for real-world EEG-based AD detection. Source code: https://github.com/DL4mHealth/LEAD
