Table of Contents
Fetching ...

GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes

Saman Khamesian, Sri Harini Balaji, Di Yang Shi, Stephanie M. Carpenter, Daniel E. Rivera, W. Bradley Knox, Peter Stone, Hassan Ghasemzadeh

Abstract

Type 1 Diabetes (T1D) management requires continuous adjustment of insulin and lifestyle behaviors to maintain blood glucose within a safe target range. Although automated insulin delivery (AID) systems have improved glycemic outcomes, many patients still fail to achieve recommended clinical targets, warranting new approaches to improve glucose control in patients with T1D. While reinforcement learning (RL) has been utilized as a promising approach, current RL-based methods focus primarily on insulin-only treatment and do not provide behavioral recommendations for glucose control. To address this gap, we propose GUIDE, an RL-based decision-support framework designed to complement AID technologies by providing behavioral recommendations to prevent abnormal glucose events. GUIDE generates structured actions defined by intervention type, magnitude, and timing, including bolus insulin administration and carbohydrate intake events. GUIDE integrates a patient-specific glucose level predictor trained on real-world continuous glucose monitoring data and supports both offline and online RL algorithms within a unified environment. We evaluate both off-policy and on-policy methods across 25 individuals with T1D using standardized glycemic metrics. Among the evaluated approaches, the CQL-BC algorithm demonstrates the highest average time-in-range, reaching 85.49% while maintaining low hypoglycemia exposures. Behavioral similarity analysis further indicates that the learned CQL-BC policy preserves key structural characteristics of patient action patterns, achieving a mean cosine similarity of 0.87 $\pm$ 0.09 across subjects. These findings suggest that conservative offline RL with a structured behavioral action space can provide clinically meaningful and behaviorally plausible decision support for personalized diabetes management.

GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes

Abstract

Type 1 Diabetes (T1D) management requires continuous adjustment of insulin and lifestyle behaviors to maintain blood glucose within a safe target range. Although automated insulin delivery (AID) systems have improved glycemic outcomes, many patients still fail to achieve recommended clinical targets, warranting new approaches to improve glucose control in patients with T1D. While reinforcement learning (RL) has been utilized as a promising approach, current RL-based methods focus primarily on insulin-only treatment and do not provide behavioral recommendations for glucose control. To address this gap, we propose GUIDE, an RL-based decision-support framework designed to complement AID technologies by providing behavioral recommendations to prevent abnormal glucose events. GUIDE generates structured actions defined by intervention type, magnitude, and timing, including bolus insulin administration and carbohydrate intake events. GUIDE integrates a patient-specific glucose level predictor trained on real-world continuous glucose monitoring data and supports both offline and online RL algorithms within a unified environment. We evaluate both off-policy and on-policy methods across 25 individuals with T1D using standardized glycemic metrics. Among the evaluated approaches, the CQL-BC algorithm demonstrates the highest average time-in-range, reaching 85.49% while maintaining low hypoglycemia exposures. Behavioral similarity analysis further indicates that the learned CQL-BC policy preserves key structural characteristics of patient action patterns, achieving a mean cosine similarity of 0.87 0.09 across subjects. These findings suggest that conservative offline RL with a structured behavioral action space can provide clinically meaningful and behaviorally plausible decision support for personalized diabetes management.

Paper Structure

This paper contains 27 sections, 17 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Illustration of an artificial pancreas system with closed-loop blood glucose control in patients with T1D. Glucose measurements from a continuous glucose monitor are processed by a control algorithm to compute insulin dosing delivered through an insulin pump.
  • Figure 2: Schematic overview illustrating the complementary roles of the AID system and the GUIDE framework, with both components utilizing glucose measurements to inform insulin delivery and behavioral action recommendations for patients with T1D.
  • Figure 3: Overview of the GUIDE framework. Real-world data from the AZT1D dataset is partitioned chronologically within each subject, with 80% used for training the personalized glucose level predictor (GLIMMER) and 20% reserved for defining initial states. After training, the learned predictor is deployed as the glucose prediction model within the environment simulator, which also includes a human-inspired meal generator. At each decision step, the RL agent selects a behavioral action based on the current state, and the environment returns the next state and reward, forming a closed-loop learning process.
  • Figure 4: Reward function $\tilde{r}_{\text{g}}(g)$ as a function of glucose level, illustrating its piecewise structure. Linear penalties are applied in hypoglycemic and hyperglycemic regions, while the reward is maximized within the target range and decreases linearly toward both thresholds.
  • Figure 5: Representative full-day simulation under full adherence using the personalized glucose prediction model. The x-axis represents time of day (hour), and the y-axis shows glucose level (mg/dL). The solid green curve denotes the predicted glucose trajectory over 24 decision steps (one simulated day). The dashed horizontal lines at 70 mg/dL and 180 mg/dL indicate hypoglycemia and hyperglycemia thresholds, respectively, defining the target in-range zone. Vertical markers denote action events: green lines indicate no action, blue lines (Eat) represent snack carbohydrate recommendations generated by the RL agent, red lines (Inject) correspond to bolus insulin recommendations, and magenta lines (Meal) denote structured meals generated by the human-inspired meal controller (Section III.C). Numerical annotations above action markers indicate the recommended carbohydrate amount (g) or insulin dose (U).
  • ...and 2 more figures