DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care
Daniel Jason Tan, Jiayang Chen, Dilruk Perera, Kay Choong See, Mengling Feng
TL;DR
DeepEN presents a conservative offline RL framework to personalize enteral nutrition in critical care, addressing the challenge of dynamic, patient-specific metabolic needs. It employs a Duelling Double DQN with Conservative Q-Learning, trained on 11k+ ICU patients from MIMIC-IV, to generate four-hourly targets for calories, protein, and water, using a state space of 102 features and a 51-action space. The reward function blends terminal outcomes (mortality) with intermediate physiological and biomarker signals, and the evaluation via CWPDIS shows reduced estimated mortality (18.8% vs 22.5%) and higher expected returns, with a strong negative return-mortality correlation. Findings support the feasibility of conservative offline RL for individualized EN therapy and suggest data-driven personalization can outperform guideline- or clinician-based strategies, while highlighting the need for external validation and exploration of richer rewards.
Abstract
ICU enteral feeding remains sub-optimal due to limited personalization and uncertainty about appropriate calorie, protein, and fluid targets, particularly under rapidly changing metabolic demands and heterogeneous patient responses. This study introduces DeepEN, a reinforcement learning (RL)-based framework that personalizes enteral nutrition (EN) dosing for critically ill patients using electronic health record data. DeepEN was trained on over 11,000 ICU patients from the MIMIC-IV database to generate 4-hourly, patient-specific targets for caloric, protein, and fluid intake. The model's state space integrates demographics, comorbidities, vital signs, laboratory results, and prior interventions relevant to nutritional management, while its reward function balances short-term physiological and nutrition-related goals with long-term survival. A dueling double deep Q-network with Conservative Q-Learning regularization is used to ensure safe and reliable policy learning from retrospective data. DeepEN achieved a 3.7 $\pm$ 0.17 percentage-point absolute reduction in estimated mortality compared with the clinician policy (18.8% vs 22.5%) and higher expected returns compared with guideline-based dosing (11.89 vs 8.11), with improvements in key nutritional biomarkers. U-shaped associations between deviations from clinician dosing and mortality suggest that the learned policy aligns with high-value clinician actions while diverging from suboptimal ones. These findings demonstrate the feasibility of conservative offline RL for individualized EN therapy and suggest that data-driven personalization may improve outcomes beyond guideline- or heuristic-based approaches.
