An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

Sadjad Anzabi Zadeh; W. Nick Street; Barrett W. Thomas

An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

Sadjad Anzabi Zadeh, W. Nick Street, Barrett W. Thomas

TL;DR

The paper addresses the explainability gap in DRL for warfarin maintenance dosing by modeling maintenance dosing as an MDP and optimizing with PPO. It then distills the learned policy into an interpretable, decision‑tree–style dosing protocol using Policy Distillation, guided by Action Forging to bias toward simple actions. The resulting explainable protocol yields PTTR on simulated data that is competitive with, and often superior to, existing baselines, while reducing the number of dose-change decisions to a small, clinician‑friendly set. This two‑stage approach provides a practical path to auditable, user‑friendly DRL‑based dosing in healthcare, with potential for real‑world adoption and extension to other sequential decision problems.

Abstract

Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging as an effective tool to achieve explainability. Our focus is on the maintenance dosing protocol. Results show that the final model is as easy to understand and deploy as the current dosing protocols and outperforms the baseline dosing algorithms.

An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

TL;DR

Abstract

An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

Authors

TL;DR

Abstract

Table of Contents

Figures (2)