Table of Contents
Fetching ...

Applications of interpretable deep learning in neuroimaging: a comprehensive review

Lindsay Munroe, Mariana da Silva, Faezeh Heidari, Irina Grigorescu, Simon Dahan, Emma C. Robinson, Maria Deprez, Po-Wah So

TL;DR

It is found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and possible future directions for the field are discussed.

Abstract

Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learning (iDL) methods that enable the visualisation and interpretation of the inner workings of deep learning models. This study systematically reviewed the literature on neuroimaging applications of iDL methods and critically analysed how iDL explanation properties were evaluated. Seventy-five studies were included, and ten categories of iDL methods were identified. We also reviewed five properties of iDL explanations that were analysed in the included studies: biological validity, robustness, continuity, selectivity, and downstream task performance. We found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and we discussed possible future directions for the field.

Applications of interpretable deep learning in neuroimaging: a comprehensive review

TL;DR

It is found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and possible future directions for the field are discussed.

Abstract

Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learning (iDL) methods that enable the visualisation and interpretation of the inner workings of deep learning models. This study systematically reviewed the literature on neuroimaging applications of iDL methods and critically analysed how iDL explanation properties were evaluated. Seventy-five studies were included, and ten categories of iDL methods were identified. We also reviewed five properties of iDL explanations that were analysed in the included studies: biological validity, robustness, continuity, selectivity, and downstream task performance. We found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and we discussed possible future directions for the field.

Paper Structure

This paper contains 75 sections, 16 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Comparison of post-hoc interpretability maps and generative interpretability methods applied to the classification of Alzheimer's disease (AD) vs Mild cognitive impairment (MCI) in brain MRI volumes. The real disease map is the "ground-truth" shown for comparison. Figure adapted from Bass et al.bass2022icam
  • Figure 2: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart
  • Figure 3: Example ofOcclusion applied to an MRI image. In a patch-wise manner, a tile of the image is occluded, and the occluded image is fed to a neural network (NN) for prediction. The difference in predicted probability between the original and occluded image is assigned to the patch location in the occlusion map. Patches that result in the greatest change in prediction when occluded are interpreted as the most important for the model task zeiler2014visualizing.
  • Figure 4: Example of Vanilla Gradients applied to an MRI image. Partial derivatives for each voxel with respect to the network output score $S_c$ for class $c$ are computed. Pixels with the largest gradients are interpreted to have the greatest influence on the model prediction simonyan2013deep.
  • Figure 5: Example Layer-wise Relevance Propagation (LRP) applied to an MRI image. The network output score $S_c$ for class $c$ is redistributed backwards through the network according to the equation shown until the input image is reached. The pixels with the highest proportion of $S_c$ are interpreted as having the greatest contribution to the model prediction bach2015pixel.
  • ...and 7 more figures