Table of Contents
Fetching ...

Data-Driven Discovery of Feature Groups in Clinical Time Series

Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova

TL;DR

The paper tackles learning global feature groups in clinical time series by clustering feature-wise embedding weights within a step-wise embedding framework, enabling end-to-end optimization for prediction while discovering interpretable groups. It demonstrates that dynamic, data-driven grouping can recover ground-truth structures on synthetic data and match expert-defined groups on real ICU data, with learned groups offering novel, task-relevant insights. The approach combines a feature embedding module, a group embedding module, and a sequence model, with hard or soft groupings and a regularized clustering objective that is integrated into standard training. Across synthetic and real-world tasks, the method shows competitive downstream performance, interpretable clusters, and potential to adapt feature groups to new datasets and tasks, albeit with added computational and hyperparameter complexity. Overall, the work advances interpretable, data-driven feature grouping for clinical time series and lays groundwork for cross-task and cross-institution exploration of variable relationships.

Abstract

Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Data-Driven Discovery of Feature Groups in Clinical Time Series

TL;DR

The paper tackles learning global feature groups in clinical time series by clustering feature-wise embedding weights within a step-wise embedding framework, enabling end-to-end optimization for prediction while discovering interpretable groups. It demonstrates that dynamic, data-driven grouping can recover ground-truth structures on synthetic data and match expert-defined groups on real ICU data, with learned groups offering novel, task-relevant insights. The approach combines a feature embedding module, a group embedding module, and a sequence model, with hard or soft groupings and a regularized clustering objective that is integrated into standard training. Across synthetic and real-world tasks, the method shows competitive downstream performance, interpretable clusters, and potential to adapt feature groups to new datasets and tasks, albeit with added computational and hyperparameter complexity. Overall, the work advances interpretable, data-driven feature grouping for clinical time series and lays groundwork for cross-task and cross-institution exploration of variable relationships.

Abstract

Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Paper Structure

This paper contains 54 sections, 19 equations, 11 figures, 21 tables.

Figures (11)

  • Figure 1: The proposed step-wise embedding model with learned feature groups. Filled and non-filled boxes represent vectors and learnable functions, respectively.
  • Figure 2: Learned feature clustering scheme.
  • Figure 3: Training learned feature groups
  • Figure 4: Synthetic data labeling procedure. Feature values are sampled from Gaussian Processes with various parameters.
  • Figure 5: Embedding space of learned features colored according to learned and prior groupings (K-means, HiRID mortality). The embedding space is visualized using t-SNE of Euclidean distances between centroids and features. The model produces novel feature clusters that differ from prior groupings.
  • ...and 6 more figures