Diabetes Lifestyle Medicine Treatment Assistance Using Reinforcement Learning
Yuhan Tang
TL;DR
The paper tackles the challenge of delivering personalized lifestyle prescriptions for type 2 diabetes amid shortages of specialized clinicians. It presents an offline contextual bandit framework using a mixed-action Soft Actor-Critic trained on NHANES cross-sectional data, with a two-stage pipeline that clusters individuals via PAM and then treats each cluster segment as an aggregated 'individual' in a single-step reinforcement-learning setting. The reward is defined by the Magni risk function, $r_i = -\text{risk}_i$, where $\text{risk}_i = 10 \times (c_0 \times (\ln(BG)^{c_1} - c_2))^2$, steering prescriptions to minimize glucose-related risk. Validation against Xiangya Hospital physician prescriptions indicates the offline model can generate plausible, risk-aware lifestyle recommendations, warranting prospective clinical validation to assess real-world impact.
Abstract
Type 2 diabetes prevention and treatment can benefit from personalized lifestyle prescriptions. However, the delivery of personalized lifestyle medicine prescriptions is limited by the shortage of trained professionals and the variability in physicians' expertise. We propose an offline contextual bandit approach that learns individualized lifestyle prescriptions from the aggregated NHANES profiles of 119,555 participants by minimizing the Magni glucose risk-reward function. The model encodes patient status and generates lifestyle medicine prescriptions, which are trained using a mixed-action Soft Actor-Critic algorithm. The task is treated as a single-step contextual bandit. The model is validated against lifestyle medicine prescriptions issued by three certified physicians from Xiangya Hospital. These results demonstrate that offline mixed-action SAC can generate risk-aware lifestyle medicine prescriptions from cross-sectional NHANES data, warranting prospective clinical validation.
