Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes

Di Wang; Yao Wang; Shao-Bo Lin

Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes

Di Wang, Yao Wang, Shao-Bo Lin

TL;DR

This work tackles scalable dynamic treatment regimes (DTRs) by marrying kernel ridge regression with distributed learning to form DKRR-DTR, enabling offline, data-rich reinforcement learning on large electronic health records. The authors present a stage-wise Q-learning formulation in RKHS, develop a divide-and-conquer paradigm to reduce computational cost from cubic to per-subset cubic, and introduce an integral-operator analytic framework to obtain generalization bounds that do not depend on input dimension. Theoretical results show a convergence rate of $O(|D|^{-r/(2r+s)})$ under mild regularity and capacity assumptions, with the horizon $T$ entering only through constants, not the rate itself. Empirically, DKRR-DTR and its distributed variant DKRR-DTR achieve competitive or superior reward outcomes compared with linear Q-learning and deep RL baselines, while significantly reducing training time, especially on large-scale datasets. The combination of solid theory and practical efficiency makes DKRR-DTR a promising approach for safe, scalable medical decision support in the era of big EHR data.

Abstract

In recent years, large amounts of electronic health records (EHRs) concerning chronic diseases have been collected to facilitate medical diagnosis. Modeling the dynamic properties of EHRs related to chronic diseases can be efficiently done using dynamic treatment regimes (DTRs). While reinforcement learning (RL) is a widely used method for creating DTRs, there is ongoing research in developing RL algorithms that can effectively handle large amounts of data. In this paper, we present a scalable kernel-based distributed Q-learning algorithm for generating DTRs. We perform both theoretical assessments and numerical analysis for the proposed approach. The results demonstrate that our algorithm significantly reduces the computational complexity associated with the state-of-the-art deep reinforcement learning methods, while maintaining comparable generalization performance in terms of accumulated rewards across stages, such as survival time or cumulative survival probability.

Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes

TL;DR

under mild regularity and capacity assumptions, with the horizon

entering only through constants, not the rate itself. Empirically, DKRR-DTR and its distributed variant DKRR-DTR achieve competitive or superior reward outcomes compared with linear Q-learning and deep RL baselines, while significantly reducing training time, especially on large-scale datasets. The combination of solid theory and practical efficiency makes DKRR-DTR a promising approach for safe, scalable medical decision support in the era of big EHR data.

Abstract

Paper Structure (21 sections, 22 theorems, 120 equations, 4 figures, 1 table)

This paper contains 21 sections, 22 theorems, 120 equations, 4 figures, 1 table.

Introduction
Mathematical formulation of Q-learning for DTRs
Problem setting
Our contributions
Related Work and Comparisons
Kernel-Based Distributed Q-Learning for DTRs
Theoretical Behaviors
Simulations
Clinical trial with a small number of treatment options
Clinical trial with a large number of treatment options
Conclusion
Introduction of Simulation 1
Introduction of Simulation 2
An Analysis of the Impact of Stage Number on DTR Performance
Proofs
...and 6 more sections

Key Result

Theorem 1

Under Assumptions 1-4, with $\frac{1}{2}\leq r\leq 1$ and $0<s\leq 1$, if $\lambda_1=\dots=\lambda_{T}=|D|^{-\frac{1}{2r+s}}$, then where $C_1(T,\mu):= C_1 (2\mu\bar{C})^TT \sum_{t=1}^T 2^{-t} \sum_{\ell=t}^T \prod_{k=\ell}^{T-1}\left((T-k+2)(2\mu^{1/2})^{k-\ell}+1\right)$, and $\bar{C}$ and $C_1$ are constants depending only on $M$, $C_0$, $\kappa$, $r$, $s$, and $\max_{t=1,\dots,T}\|h_t\|

Figures (4)

Figure 1: Training and decision-making flows of Q-learning.
Figure 2: Training and decision-making flows of DKRR-DTR.
Figure 3: Comparisons of Survival time and training time for the mentioned methods
Figure 4: Comparisons of CSP and training time for the mentioned methods.

Theorems & Definitions (22)

Theorem 1
Theorem 2
Proposition 1
Lemma 1
Lemma 2
Lemma 3
Proposition 2
Proposition 3
Lemma 4
Lemma 5
...and 12 more

Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes

TL;DR

Abstract

Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)