Table of Contents
Fetching ...

Validating Deep Models for Alzheimer's 18F-FDG PET Diagnosis Across Populations: A Study with Latin American Data

Hugo Massaroli, Hernan Chaves, Pilar Anania, Mauricio Farez, Emmanuel Iarussi, Viviana Siless

TL;DR

The study interrogates whether deep learning models trained on ADNI FDG-PET data generalize to a Latin American cohort (FLENI) for Alzheimer's diagnosis. It compares CNN, Transformer, and lightweight ResNet architectures under matched training settings, revealing a substantial drop in AUC when applied to FLENI despite strong in-domain performance. Ablation and interpretability analyses identify full-slice input and per-image normalization as key for cross-domain robustness, while occlusion maps show shifted spatial attention across cohorts. The findings underscore the necessity of population-aware validation and domain adaptation to ensure clinically reliable diagnostic AI across diverse populations.

Abstract

Deep learning models have shown strong performance in diagnosing Alzheimer's disease (AD) using neuroimaging data, particularly 18F-FDG PET scans, with training datasets largely composed of North American cohorts such as those in the Alzheimer's Disease Neuroimaging Initiative (ADNI). However, their generalization to underrepresented populations remains underexplored. In this study, we benchmark convolutional and Transformer-based models on the ADNI dataset and assess their generalization performance on a novel Latin American clinical cohort from the FLENI Institute in Buenos Aires, Argentina. We show that while all models achieve high AUCs on ADNI (up to .96, .97), their performance drops substantially on FLENI (down to .82, .80, respectively), revealing a significant domain shift. The tested architectures demonstrated similar performance, calling into question the supposed advantages of transformers for this specific task. Through ablation studies, we identify per-image normalization and a correct sampling selection as key factors for generalization. Occlusion sensitivity analysis further reveals that models trained on ADNI, generally attend to canonical hypometabolic regions for the AD class, but focus becomes unclear for the other classes and for FLENI scans. These findings highlight the need for population-aware validation of diagnostic AI models and motivate future work on domain adaptation and cohort diversification.

Validating Deep Models for Alzheimer's 18F-FDG PET Diagnosis Across Populations: A Study with Latin American Data

TL;DR

The study interrogates whether deep learning models trained on ADNI FDG-PET data generalize to a Latin American cohort (FLENI) for Alzheimer's diagnosis. It compares CNN, Transformer, and lightweight ResNet architectures under matched training settings, revealing a substantial drop in AUC when applied to FLENI despite strong in-domain performance. Ablation and interpretability analyses identify full-slice input and per-image normalization as key for cross-domain robustness, while occlusion maps show shifted spatial attention across cohorts. The findings underscore the necessity of population-aware validation and domain adaptation to ensure clinically reliable diagnostic AI across diverse populations.

Abstract

Deep learning models have shown strong performance in diagnosing Alzheimer's disease (AD) using neuroimaging data, particularly 18F-FDG PET scans, with training datasets largely composed of North American cohorts such as those in the Alzheimer's Disease Neuroimaging Initiative (ADNI). However, their generalization to underrepresented populations remains underexplored. In this study, we benchmark convolutional and Transformer-based models on the ADNI dataset and assess their generalization performance on a novel Latin American clinical cohort from the FLENI Institute in Buenos Aires, Argentina. We show that while all models achieve high AUCs on ADNI (up to .96, .97), their performance drops substantially on FLENI (down to .82, .80, respectively), revealing a significant domain shift. The tested architectures demonstrated similar performance, calling into question the supposed advantages of transformers for this specific task. Through ablation studies, we identify per-image normalization and a correct sampling selection as key factors for generalization. Occlusion sensitivity analysis further reveals that models trained on ADNI, generally attend to canonical hypometabolic regions for the AD class, but focus becomes unclear for the other classes and for FLENI scans. These findings highlight the need for population-aware validation of diagnostic AI models and motivate future work on domain adaptation and cohort diversification.

Paper Structure

This paper contains 17 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Schematic of the Transformer architecture. Multi-view ResNet encoders process axial, sagittal, and coronal slices. Features are flattened into patches with positional encoding and passed to a transformer encoder.
  • Figure 2: Occlusion sensitivity maps from the Inception model. Top: ADNI samples (CN, MCI, AD). Bottom: FLENI 600 samples (non-AD, AD). The highlighted regions indicate areas with the greatest impact on model predictions.