A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Thomas Yu Chow Tam; Litian Liang; Ke Chen; Haohan Wang; Wei Wu

A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Thomas Yu Chow Tam, Litian Liang, Ke Chen, Haohan Wang, Wei Wu

TL;DR

This study tackles the interpretability gap in deep learning models for MRI-based Alzheimer's Disease (AD) classification by introducing a quantitative disease-focusing framework that links model attention to known AD pathology. It combines gradient-based saliency maps with FastSurfer ROI segmentations to compute region-level importance and defines a Disease-Focus Score ($DF$-Score) that measures alignment with pathologically relevant brain regions. Through experiments on ADNI and an external AIBL dataset, the authors compare a baseline 3D ResNet, a pretrained MedicalNet, and MedicalNet with data augmentation, alongside conventional volumetric-feature ML baselines, assessing both classification performance and disease-focused interpretability. The results show that fine-tuning pretrained models and applying augmentation improves focus on disease-relevant regions and overall accuracy, while traditional VF-ML approaches can match or exceed DL performance in some metrics; the proposed DF framework enhances interpretability and supports clinical translation efforts by revealing how models relate to known AD biomarkers.

Abstract

Deep learning (DL) models have shown significant potential in Alzheimer's Disease (AD) classification. However, understanding and interpreting these models remains challenging, which hinders the adoption of these models in clinical practice. Techniques such as saliency maps have been proven effective in providing visual and empirical clues about how these models work, but there still remains a gap in understanding which specific brain regions DL models focus on and whether these brain regions are pathologically associated with AD. To bridge such gap, in this study, we developed a quantitative disease-focusing strategy to first enhance the interpretability of DL models using saliency maps and brain segmentations; then we propose a disease-focus (DF) score that quantifies how much a DL model focuses on brain areas relevant to AD pathology based on clinically known MRI-based pathological regions of AD. Using this strategy, we compared several state-of-the-art DL models, including a baseline 3D ResNet model, a pretrained MedicalNet model, and a MedicalNet with data augmentation to classify patients with AD vs. cognitive normal patients using MRI data; then we evaluated these models in terms of their abilities to focus on disease-relevant regions. Our results show interesting disease-focusing patterns with different models, particularly characteristic patterns with the pretrained models and data augmentation, and also provide insight into their classification performance. These results suggest that the approach we developed for quantitatively assessing the abilities of DL models to focus on disease-relevant regions may help improve interpretability of these models for AD classification and facilitate their adoption for AD diagnosis in clinical practice. The code is publicly available at https://github.com/Liang-lt/ADNI.

A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

TL;DR

-Score) that measures alignment with pathologically relevant brain regions. Through experiments on ADNI and an external AIBL dataset, the authors compare a baseline 3D ResNet, a pretrained MedicalNet, and MedicalNet with data augmentation, alongside conventional volumetric-feature ML baselines, assessing both classification performance and disease-focused interpretability. The results show that fine-tuning pretrained models and applying augmentation improves focus on disease-relevant regions and overall accuracy, while traditional VF-ML approaches can match or exceed DL performance in some metrics; the proposed DF framework enhances interpretability and supports clinical translation efforts by revealing how models relate to known AD biomarkers.

Abstract

Paper Structure (19 sections, 3 equations, 1 figure, 5 tables)

This paper contains 19 sections, 3 equations, 1 figure, 5 tables.

Introduction
Methods
Convolutional Neural Networks with MRI Data
Region-of-Interest (ROI) Segmentation and Volumetric Measurements from FastSurfer
Our Quantitative Disease-Focusing Approach
Experiments
Experimental Setup
CNNs with MRI Data
Volumetric feature-based ML Approach with ROI Segmentation Statistics
Independent Test with an External Dataset
Comparisons of the CNN models using Our Quantitative Disease-Focusing Approach
Saliency Map Generation
Analysis of Saliency Maps
Evaluation of the disease focusing ability of the VF-ML methods
Results
...and 4 more sections

Figures (1)

Figure 1: Saliency maps illustrating the areas of an input image with the largest effect on the output prediction of DL models. Saliency maps for 2 AD subjects and 2 CN subjects are shown in the rows. The columns show the MRI scans, ROI-segmented images, and the saliency maps for the 3D ResNet, MedicalNet, and MedicalNet + DA, respectively.

A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

TL;DR

Abstract

A Quantitative Approach for Evaluating Disease Focus and Interpretability of Deep Learning Models for Alzheimer's Disease Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (1)