Table of Contents
Fetching ...

An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning

Weiwei Gu, Senquan Wang

TL;DR

This work tackles automated blood glucose control for type 1 diabetes by formulating the problem as a BG dosing task solved with a multi-step Deep Q-Network (DQN) enhanced by Prioritized Experience Replay (PER). It introduces an exponential-decay drug concentration model to convert the delayed, prolonged action effects from a PAE-POMDP into a standard MDP, enabling effective DRL with limited history encoding. The proposed approach yields faster convergence and significantly higher time-in-range (TIR) in the BG target window (85.62% vs 56.92% for a baseline), demonstrating the benefit of combining multi-step learning with PER in a BG-control setting. This method improves learning efficiency and dosing strategy quality in a simulated closed-loop BG control environment, with implications for personalized, automated insulin therapy. The study also highlights avenues for future work, such as incorporating meal, exercise, and other physiological factors to further enhance robustness.

Abstract

Blood Glucose (BG) control involves keeping an individual's BG within a healthy range through extracorporeal insulin injections is an important task for people with type 1 diabetes. However,traditional patient self-management is cumbersome and risky. Recent research has been devoted to exploring individualized and automated BG control approaches, among which Deep Reinforcement Learning (DRL) shows potential as an emerging approach. In this paper, we use an exponential decay model of drug concentration to convert the formalization of the BG control problem, which takes into account the delay and prolongedness of drug effects, from a PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process) to a MDP, and we propose a novel multi-step DRL-based algorithm to solve the problem. The Prioritized Experience Replay (PER) sampling method is also used in it. Compared to single-step bootstrapped updates, multi-step learning is more efficient and reduces the influence from biasing targets. Our proposed method converges faster and achieves higher cumulative rewards compared to the benchmark in the same training environment, and improves the time-in-range (TIR), the percentage of time the patient's BG is within the target range, in the evaluation phase. Our work validates the effectiveness of multi-step reinforcement learning in BG control, which may help to explore the optimal glycemic control measure and improve the survival of diabetic patients.

An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning

TL;DR

This work tackles automated blood glucose control for type 1 diabetes by formulating the problem as a BG dosing task solved with a multi-step Deep Q-Network (DQN) enhanced by Prioritized Experience Replay (PER). It introduces an exponential-decay drug concentration model to convert the delayed, prolonged action effects from a PAE-POMDP into a standard MDP, enabling effective DRL with limited history encoding. The proposed approach yields faster convergence and significantly higher time-in-range (TIR) in the BG target window (85.62% vs 56.92% for a baseline), demonstrating the benefit of combining multi-step learning with PER in a BG-control setting. This method improves learning efficiency and dosing strategy quality in a simulated closed-loop BG control environment, with implications for personalized, automated insulin therapy. The study also highlights avenues for future work, such as incorporating meal, exercise, and other physiological factors to further enhance robustness.

Abstract

Blood Glucose (BG) control involves keeping an individual's BG within a healthy range through extracorporeal insulin injections is an important task for people with type 1 diabetes. However,traditional patient self-management is cumbersome and risky. Recent research has been devoted to exploring individualized and automated BG control approaches, among which Deep Reinforcement Learning (DRL) shows potential as an emerging approach. In this paper, we use an exponential decay model of drug concentration to convert the formalization of the BG control problem, which takes into account the delay and prolongedness of drug effects, from a PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process) to a MDP, and we propose a novel multi-step DRL-based algorithm to solve the problem. The Prioritized Experience Replay (PER) sampling method is also used in it. Compared to single-step bootstrapped updates, multi-step learning is more efficient and reduces the influence from biasing targets. Our proposed method converges faster and achieves higher cumulative rewards compared to the benchmark in the same training environment, and improves the time-in-range (TIR), the percentage of time the patient's BG is within the target range, in the evaluation phase. Our work validates the effectiveness of multi-step reinforcement learning in BG control, which may help to explore the optimal glycemic control measure and improve the survival of diabetic patients.
Paper Structure (24 sections, 9 equations, 4 figures, 2 tables)

This paper contains 24 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic diagram of the algorithm for the BG control problem.
  • Figure 2: Cumulative discounted sum of rewards during the training process.
  • Figure 3: Policy visualization.
  • Figure 4: Time distribution of BG in different ranges.