Table of Contents
Fetching ...

Dynamic feature selection in medical predictive monitoring by reinforcement learning

Yutong Chen, Jiandong Gao, Ji Wu

TL;DR

This work tackles dynamic, cost-aware feature selection for multivariate time-series in clinical monitoring by formulating it as an offline reinforcement learning problem with a POMDP-like structure. It introduces a predictor P_φ and an actor π_θ that choose time-varying feature updates under a budget, balancing prediction accuracy (via a normalized prediction reward) and acquisition costs (via a dynamic cost reward), and it trains the policy with PPO while iteratively updating a non-differentiable predictor using synthesized states. Evaluations on the MIMIC-IV dataset across P/F ratio prediction (regression) and ventilation termination (classification) tasks show the method matches baseline performance without cost constraints and outperforms strong baselines under strict cost limits, with interpretable, time-varying feature importance. The approach supports non-differentiable predictors, reveals actionable feature-importance dynamics over time, and highlights potential for reducing unnecessary testing while maintaining predictive performance in real-world ICU monitoring, albeit with lower sample efficiency and offline-training limitations to address in future work.

Abstract

In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.

Dynamic feature selection in medical predictive monitoring by reinforcement learning

TL;DR

This work tackles dynamic, cost-aware feature selection for multivariate time-series in clinical monitoring by formulating it as an offline reinforcement learning problem with a POMDP-like structure. It introduces a predictor P_φ and an actor π_θ that choose time-varying feature updates under a budget, balancing prediction accuracy (via a normalized prediction reward) and acquisition costs (via a dynamic cost reward), and it trains the policy with PPO while iteratively updating a non-differentiable predictor using synthesized states. Evaluations on the MIMIC-IV dataset across P/F ratio prediction (regression) and ventilation termination (classification) tasks show the method matches baseline performance without cost constraints and outperforms strong baselines under strict cost limits, with interpretable, time-varying feature importance. The approach supports non-differentiable predictors, reveals actionable feature-importance dynamics over time, and highlights potential for reducing unnecessary testing while maintaining predictive performance in real-world ICU monitoring, albeit with lower sample efficiency and offline-training limitations to address in future work.

Abstract

In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.
Paper Structure (29 sections, 9 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 9 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Different feature selection scenarios. Orange blocks indicate selected features. Left: Existed methods select a subset of features for each sample. In univariate time-series feature selection, horizon axis is the time ticks of a single feature. Right: Our method focus on finding time varying feature subsets for each sample sequence.
  • Figure 2: POMDP state transition. Orange arrows denote fixed conditional state transitions. Blue arrows represent a simple copy operation. Green arrows represent trainable functions. The initial state $s_0$ is a non-trainable constant vector. Sequence $\mathbf{x}^i$ is randomly sampled from dataset.
  • Figure 3: How actor cooperates with predictor in synthetic environment. Orange circles: Updated features in next tick. Blue circles: Features that not updated. Grey blocks: environment and predictor are fixed in policy training.
  • Figure 4: Performance comparison results on test dataset. We run four baseline methods (GBDT, LSTM, LASSO, SVM-L1) for comparison. For our method, we use GBDT as predictor in P/F ratio prediction and LSTM in ventilation prediction. The cost is computed as average per-tick cost. We set X axis as logarithmic based on 10. For regression task, we use MAE as loss function. We use 1-AUC as loss function in the binary classification task. Some curves of LASSO and SVM-L1 are not drawn entirely. The detailed results are provided in \ref{['subsec: eval-results']}.
  • Figure 5: Policy visualization in two tasks. X-axis represents ordered features, and Y-axis represents time ticks (tick=0 at the top). We use $C_{\text{max}}=10$ and simple cost setting in this figure. Features are ordered by the mean activation along the time axis. Left: In P/F prediction task, feature selection is more concentrated on specific features. Some features are sampled multiple times while others are only sampled in the initial state. Right: In ventilation task, the feature selection is dispersed.
  • ...and 1 more figures