Table of Contents
Fetching ...

Explaining Uncertainty in Multiple Sclerosis Lesion Segmentation Beyond Prediction Errors

Nataliia Molchanova, Pedro M. Gordaliza, Alessandro Cagol, Mario Ocampo--Pineda, Po--Jui Lu, Matthias Weigel, Xinjie Chen, Erin S. Beck, Haris Tsagkas, Daniel Reich, Anna Stölting, Pietro Maggi, Delphine Ribes, Adrien Depeursinge, Cristina Granziera, Henning Müller, Meritxell Bach Cuadra

TL;DR

This work tackles the interpretability of predictive uncertainty in deep-learning cortical MS lesion segmentation by introducing LSU, an instance-wise, lesion-scale uncertainty measure derived from a deep ensemble. The authors build a comprehensive explainability framework that regresses LSU on a broad set of clinically meaningful lesion- and patient-level features, enabling a shift from error-centric to clinically informative uncertainty analysis. They validate the approach on in-domain and distribution-shift datasets ( Basel and Lausanne ), demonstrating that uncertainty tracks lesion size, shape, and cortical involvement and that domain shifts diminish the informativeness and transferability of explanations. Expert feedback confirms the clinical relevance of the identified factors, and the framework shows potential generalizability to other tasks, modalities, and clinical settings, while also highlighting the presence of unexplained uncertainty that warrants further investigation.

Abstract

Trustworthy artificial intelligence (AI) is essential in healthcare, particularly for high-stakes tasks like medical image segmentation. Explainable AI and uncertainty quantification significantly enhance AI reliability by addressing key attributes such as robustness, usability, and explainability. Despite extensive technical advances in uncertainty quantification for medical imaging, understanding the clinical informativeness and interpretability of uncertainty remains limited. This study introduces a novel framework to explain the potential sources of predictive uncertainty, specifically in cortical lesion segmentation in multiple sclerosis using deep ensembles. The proposed analysis shifts the focus from the uncertainty-error relationship towards relevant medical and engineering factors. Our findings reveal that instance-wise uncertainty is strongly related to lesion size, shape, and cortical involvement. Expert rater feedback confirms that similar factors impede annotator confidence. Evaluations conducted on two datasets (206 patients, almost 2000 lesions) under both in-domain and distribution-shift conditions highlight the utility of the framework in different scenarios.

Explaining Uncertainty in Multiple Sclerosis Lesion Segmentation Beyond Prediction Errors

TL;DR

This work tackles the interpretability of predictive uncertainty in deep-learning cortical MS lesion segmentation by introducing LSU, an instance-wise, lesion-scale uncertainty measure derived from a deep ensemble. The authors build a comprehensive explainability framework that regresses LSU on a broad set of clinically meaningful lesion- and patient-level features, enabling a shift from error-centric to clinically informative uncertainty analysis. They validate the approach on in-domain and distribution-shift datasets ( Basel and Lausanne ), demonstrating that uncertainty tracks lesion size, shape, and cortical involvement and that domain shifts diminish the informativeness and transferability of explanations. Expert feedback confirms the clinical relevance of the identified factors, and the framework shows potential generalizability to other tasks, modalities, and clinical settings, while also highlighting the presence of unexplained uncertainty that warrants further investigation.

Abstract

Trustworthy artificial intelligence (AI) is essential in healthcare, particularly for high-stakes tasks like medical image segmentation. Explainable AI and uncertainty quantification significantly enhance AI reliability by addressing key attributes such as robustness, usability, and explainability. Despite extensive technical advances in uncertainty quantification for medical imaging, understanding the clinical informativeness and interpretability of uncertainty remains limited. This study introduces a novel framework to explain the potential sources of predictive uncertainty, specifically in cortical lesion segmentation in multiple sclerosis using deep ensembles. The proposed analysis shifts the focus from the uncertainty-error relationship towards relevant medical and engineering factors. Our findings reveal that instance-wise uncertainty is strongly related to lesion size, shape, and cortical involvement. Expert rater feedback confirms that similar factors impede annotator confidence. Evaluations conducted on two datasets (206 patients, almost 2000 lesions) under both in-domain and distribution-shift conditions highlight the utility of the framework in different scenarios.

Paper Structure

This paper contains 35 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Examples of cortical and white matter lesions and their subtypes on MP2RAGE MRI scan: intracortical, leukocortical, juxtacortical, deep white matter lesions. Lesions appear as hypointense regions within GM and WM. The examples illustrate a possible similarity between leukocortical and juxtacortical, contributing to the confusion between cortical and white matter lesions.
  • Figure 2: Illustration of the proposed framework for explaining instance-wise uncertainty of a DL segmentation model.
  • Figure 3: The clinical questionnaire with guidelines provided to the doctors on the left and questions on the right.
  • Figure 4: The quality of fit of the linear models measured by the coefficient of determination $R^2$ ($\uparrow$, red) and MAE ($\downarrow$, blue) computed on Train, Test-in, and Test-out sets. Rows correspond to different pairs of sets used for fitting a linear model and different feature sets; columns correspond to the sets used for the evaluation of the linear model. The mean and standard deviation are computed across different random seeds.
  • Figure 5: Barplot with linear explainer coefficients treated as feature importance. Barplots are built for Train, Test-in, and Test-out sets. Positive values - direct relationship with uncertainty, and negative values - direct relationship with certainty. Twenty features with the highest absolute coefficients are built. MAD -mean absolute deviation.
  • ...and 3 more figures