Table of Contents
Fetching ...

Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data

Camelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André Stollenwerk

TL;DR

This paper addresses the challenge of explainability in neonatal breath classification from time-series data by evaluating Grad-CAM explanations applied to a CNN-based XCM model. A user-study involving developers and domain experts assesses the usefulness of Grad-CAM heatmaps and a dual-view narration of flow and pressure contributions, using data from 18 neonates and a 5-fold cross-validated training regime. Results indicate that, despite potential, the explanations are not yet clinically useful in their current form, with misclassifications and padding artefacts reducing informativeness and trust, especially among medical professionals. The study highlights the importance of human factors in xAI evaluations and points to the need for clearer explanations and larger, more diverse datasets to achieve reliable clinical deployment.

Abstract

With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.

Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data

TL;DR

This paper addresses the challenge of explainability in neonatal breath classification from time-series data by evaluating Grad-CAM explanations applied to a CNN-based XCM model. A user-study involving developers and domain experts assesses the usefulness of Grad-CAM heatmaps and a dual-view narration of flow and pressure contributions, using data from 18 neonates and a 5-fold cross-validated training regime. Results indicate that, despite potential, the explanations are not yet clinically useful in their current form, with misclassifications and padding artefacts reducing informativeness and trust, especially among medical professionals. The study highlights the importance of human factors in xAI evaluations and points to the need for clearer explanations and larger, more diverse datasets to achieve reliable clinical deployment.

Abstract

With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
Paper Structure (6 sections, 7 figures, 3 tables)

This paper contains 6 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Plotting of exemplary flow and pressure data.
  • Figure 2: Three of the identified breath types: spontaneous (spont.), mechanical and triggered. The unclassifiable category, which does not exhibit a specific representative form, is excluded here.
  • Figure 3: Data pipeline from flow and pressure input to the resulting classification with a two-fold explanation. The CNN architecture is adopted from fauvel_xcm_2021. Abbreviations: $D$ - number of observed variables, $T$ - time series length. Grad-CAM is applied to the 2D convolution (orange) to deliver variable-wise explanation and to the last 1D convolution (blue) to deliver a combined explanation.
  • Figure 4: Explanations of the classification as displayed in the used data viewer. The background color expresses the importance of the inputs' combination with a color gradient from low (blue) to high (red). The inputs' separate significance is depicted by the color intensity of the corresponding flow (red) and pressure (blue) curves. Both curves are visually extended with the first (05:13:12.2) and last (05:13:13.8) available value respectively for better perceptibility.
  • Figure 5: Numeric results of the assessment of the perceived performance metrics separated by the participants' profession. The ratings go from 1 (worst) to 6 (best), with the mean rating per group being visualized as a cross.
  • ...and 2 more figures