Table of Contents
Fetching ...

A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens

Palash Ghosh, Xinru Wang, Trikay Nalamada, Shruti Agarwal, Maria Jahja, Bibhas Chakraborty

TL;DR

Addresses non-convergence of the Q-shared method for estimating optimal dynamic treatment regimens with shared parameters across $J$ stages. The authors introduce a penalized Q-shared algorithm using ridge penalty, with lambda selected by 10-fold cross-validation, to stabilize estimation and ensure convergence even when the standard Q-shared fails. Through synthetic simulations and a STAR*D depression dataset, the penalized method improves allocation matching to the oracle rule and reduces bootstrap variance of the shared parameters, while preserving robustness to initial values. The work extends Q-learning-based DTR estimation to ensure reliable shared-parameter inference and opens directions for multiple treatments, more stages, and interpretable DTRs.

Abstract

A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning-based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition under which Q-shared fails. We develop a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.

A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens

TL;DR

Addresses non-convergence of the Q-shared method for estimating optimal dynamic treatment regimens with shared parameters across stages. The authors introduce a penalized Q-shared algorithm using ridge penalty, with lambda selected by 10-fold cross-validation, to stabilize estimation and ensure convergence even when the standard Q-shared fails. Through synthetic simulations and a STAR*D depression dataset, the penalized method improves allocation matching to the oracle rule and reduces bootstrap variance of the shared parameters, while preserving robustness to initial values. The work extends Q-learning-based DTR estimation to ensure reliable shared-parameter inference and opens directions for multiple treatments, more stages, and interpretable DTRs.

Abstract

A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning-based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition under which Q-shared fails. We develop a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.

Paper Structure

This paper contains 7 sections, 11 equations, 1 figure, 7 tables, 2 algorithms.

Figures (1)

  • Figure 1: Convergence patterns of Q-shared (top row) and penalized Q-shared (bottom row) based on 50 $m$-out-of-$n$ bootstrap samples. The initial values for all the cases have been set to zero.