Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data
Camelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André Stollenwerk
TL;DR
This paper addresses the challenge of explainability in neonatal breath classification from time-series data by evaluating Grad-CAM explanations applied to a CNN-based XCM model. A user-study involving developers and domain experts assesses the usefulness of Grad-CAM heatmaps and a dual-view narration of flow and pressure contributions, using data from 18 neonates and a 5-fold cross-validated training regime. Results indicate that, despite potential, the explanations are not yet clinically useful in their current form, with misclassifications and padding artefacts reducing informativeness and trust, especially among medical professionals. The study highlights the importance of human factors in xAI evaluations and points to the need for clearer explanations and larger, more diverse datasets to achieve reliable clinical deployment.
Abstract
With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
