Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps
Yuwei Liu, Chen Dan, Anubhav Bhatti, Bingjie Shen, Divij Gupta, Suraj Parmar, San Lee
TL;DR
This work tackles the challenge of predicting sepsis-relevant vital signs in ICUs with interpretable forecasts. It introduces a model-agnostic attention mechanism that sits atop black-box time-series models like N-HiTS and N-BEATS to produce attention heatmaps, revealing which historical inputs drive predictions. Evaluated on the eICU-CRD dataset using MSE and DTW, the approach preserves predictive accuracy while substantially enhancing interpretability, with N-HiTS + Attention often delivering robust performance. The resulting attention maps offer clinicians insights into critical time windows and influential vital signs, and the framework is applicable to other forecasting models, supporting broader adoption in critical-care decision support.
Abstract
Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the differences in their performance and identifying crucial factors influencing vital sign forecasting.
