Table of Contents
Fetching ...

Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations

Óscar Escudero-Arnanz, Antonio G. Marques, Inmaculada Mora-Jiménez, Joaquín Álvarez-Rodríguez, Cristina Soguero-Ruiz

TL;DR

This paper tackles the problem of early multidrug resistance (MDR) detection in ICU patients by combining multivariate time series (MTS) representations with interpretable, graph-based analyses. It introduces a framework that computes patient-to-patient similarity via FE, Dynamic Time Warping (DTW), and Time Cluster Kernel (TCK), followed by dimensionality reduction and simple classifiers (LR, RF, SVM) to predict MDR, while also constructing similarity graphs and clustering structures for interpretability. Key contributions include (i) a robust, interpretable MTS-based pipeline achieving ROC-AUC up to ~81% on ICU EHR data, (ii) a graph- and cluster-based knowledge extraction approach that reveals clinically meaningful MDR patterns and risk factors, and (iii) open-source code and a validation protocol enabling replication and extension. The framework supports early detection and risk factor identification, offering practical value for critical care and a foundation for applying explainable ML to similar clinical time-series problems across institutions and conditions.

Abstract

Background and Objectives: Multidrug Resistance (MDR) is a critical global health issue, causing increased hospital stays, healthcare costs, and mortality. This study proposes an interpretable Machine Learning (ML) framework for MDR prediction, aiming for both accurate inference and enhanced explainability. Methods: Patients are modeled as Multivariate Time Series (MTS), capturing clinical progression and patient-to-patient interactions. Similarity among patients is quantified using MTS-based methods: descriptive statistics, Dynamic Time Warping, and Time Cluster Kernel. These similarity measures serve as inputs for MDR classification via Logistic Regression, Random Forest, and Support Vector Machines, with dimensionality reduction and kernel transformations improving model performance. For explainability, patient similarity networks are constructed from these metrics. Spectral clustering and t-SNE are applied to identify MDR-related subgroups and visualize high-risk clusters, enabling insight into clinically relevant patterns. Results: The framework was validated on ICU Electronic Health Records from the University Hospital of Fuenlabrada, achieving an AUC of 81%. It outperforms baseline ML and deep learning models by leveraging graph-based patient similarity. The approach identifies key risk factors -- prolonged antibiotic use, invasive procedures, co-infections, and extended ICU stays -- and reveals clinically meaningful clusters. Code and results are available at \https://github.com/oscarescuderoarnanz/DM4MTS. Conclusions: Patient similarity representations combined with graph-based analysis provide accurate MDR prediction and interpretable insights. This method supports early detection, risk factor identification, and patient stratification, highlighting the potential of explainable ML in critical care.

Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations

TL;DR

This paper tackles the problem of early multidrug resistance (MDR) detection in ICU patients by combining multivariate time series (MTS) representations with interpretable, graph-based analyses. It introduces a framework that computes patient-to-patient similarity via FE, Dynamic Time Warping (DTW), and Time Cluster Kernel (TCK), followed by dimensionality reduction and simple classifiers (LR, RF, SVM) to predict MDR, while also constructing similarity graphs and clustering structures for interpretability. Key contributions include (i) a robust, interpretable MTS-based pipeline achieving ROC-AUC up to ~81% on ICU EHR data, (ii) a graph- and cluster-based knowledge extraction approach that reveals clinically meaningful MDR patterns and risk factors, and (iii) open-source code and a validation protocol enabling replication and extension. The framework supports early detection and risk factor identification, offering practical value for critical care and a foundation for applying explainable ML to similar clinical time-series problems across institutions and conditions.

Abstract

Background and Objectives: Multidrug Resistance (MDR) is a critical global health issue, causing increased hospital stays, healthcare costs, and mortality. This study proposes an interpretable Machine Learning (ML) framework for MDR prediction, aiming for both accurate inference and enhanced explainability. Methods: Patients are modeled as Multivariate Time Series (MTS), capturing clinical progression and patient-to-patient interactions. Similarity among patients is quantified using MTS-based methods: descriptive statistics, Dynamic Time Warping, and Time Cluster Kernel. These similarity measures serve as inputs for MDR classification via Logistic Regression, Random Forest, and Support Vector Machines, with dimensionality reduction and kernel transformations improving model performance. For explainability, patient similarity networks are constructed from these metrics. Spectral clustering and t-SNE are applied to identify MDR-related subgroups and visualize high-risk clusters, enabling insight into clinically relevant patterns. Results: The framework was validated on ICU Electronic Health Records from the University Hospital of Fuenlabrada, achieving an AUC of 81%. It outperforms baseline ML and deep learning models by leveraging graph-based patient similarity. The approach identifies key risk factors -- prolonged antibiotic use, invasive procedures, co-infections, and extended ICU stays -- and reveals clinically meaningful clusters. Code and results are available at \https://github.com/oscarescuderoarnanz/DM4MTS. Conclusions: Patient similarity representations combined with graph-based analysis provide accurate MDR prediction and interpretable insights. This method supports early detection, risk factor identification, and patient stratification, highlighting the potential of explainable ML in critical care.

Paper Structure

This paper contains 25 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Architectural workflow that integrates time series analysis techniques with DR methods and kernel transformations, aimed at the classification of patients with MDR and the extraction of valuable insights, via graph representation, clustering, and visualization.
  • Figure 2: Histogram and boxplots of the time elapsed (in days) from ICU admission to ICU discharge (delimited by 50 days). The first vertical line (blue) corresponds to MDR patients and indicates the average time from ICU admission to the first MDR acquisition. The second vertical line (green) represents the average stay length for non-MDR patients.
  • Figure 3: ROC-AUC values for the classification models (LR, RF, and $\nu$-SVM) when considering non-DR (original space and kernel transformations) and DR methods (PCA, KPCA, AE, and DAE) for: (a) FE; (b): TCK; (c) DTW$_D$; (d) DTW$_I$. Box-plots with the best results in terms of median ROC-AUC with and without DR have been highlighted in dark blue.
  • Figure 4: Box plots of the ROC-AUC scores for each evaluated model (x-axis), with the median values highlighted in blue. The boxes represent the interquartile range (y-axis). The models are grouped as follows: LSTM, GRU, and Transformers correspond to Raw MTS to DL models; PCA + Transformer represents Feature PCA-DR + Transformer; MLP + Transformer denotes Feature MLP-DR + Transformer; Flatten DR + MLP refers to Hybrid row-column DR + Flattening and MLP; and Flatten Raw Data + MLP corresponds to Raw data + Flattening and MLP.
  • Figure 5: Representation of the similarity matrix for a $\mathcal{D}_{train}$ dataset acquired through DTW$_D$ combined with an exponential kernel: (a) original similarity matrix; (b) similarity matrix after applying a threshold that removes values below $1.9$; and (c) graphical representation derived from (b), where blue circles represent MDR patients and green circles represent non-MDR patients.
  • ...and 6 more figures