Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

Khanh Nguyen; Huy Hoang Nguyen; Egor Panfilov; Aleksei Tiulpin

Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

Khanh Nguyen, Huy Hoang Nguyen, Egor Panfilov, Aleksei Tiulpin

TL;DR

This work tackles cost-efficient data collection in adaptive KOA trials by framing Active Sensing as an RL problem. The authors design a Q-network–based agent that chooses follow-up versus skip, using a multimodal, patient-level state that fuses imaging biomarkers (notably fJSW-derived measures) with clinical data across both knees. A novel reward function ties together follow-up costs, data utility, and progression timing to produce an economically favorable policy, with ablations guiding parameter choices. Trained on the OAI dataset, the RL method outperforms baselines in BA and recall and achieves positive reward per person (RPP) while reducing follow-up costs, demonstrating potential for more cost-effective KOA trials and data collection in broader clinical settings. The approach is fully automatic at test time and is released publicly to spur adoption and further refinement in adaptive trial design as well as other chronic diseases.

Abstract

Osteoarthritis (OA) is the most common musculoskeletal disease, with knee OA (KOA) being one of the leading causes of disability and a significant economic burden. Predicting KOA progression is crucial for improving patient outcomes, optimizing healthcare resources, studying the disease, and developing new treatments. The latter application particularly requires one to understand the disease progression in order to collect the most informative data at the right time. Existing methods, however, are limited by their static nature and their focus on individual joints, leading to suboptimal predictive performance and downstream utility. Our study proposes a new method that allows to dynamically monitor patients rather than individual joints with KOA using a novel Active Sensing (AS) approach powered by Reinforcement Learning (RL). Our key idea is to directly optimize for the downstream task by training an agent that maximizes informative data collection while minimizing overall costs. Our RL-based method leverages a specially designed reward function to monitor disease progression across multiple body parts, employs multimodal deep learning, and requires no human input during testing. Extensive numerical experiments demonstrate that our approach outperforms current state-of-the-art models, paving the way for the next generation of KOA trials.

Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

TL;DR

Abstract

Paper Structure (8 sections, 14 equations, 7 figures, 8 tables)

This paper contains 8 sections, 14 equations, 7 figures, 8 tables.

Our proposed Active Sensing Method
Reward function
Markov Decision Process
Q-function approximation
Training Q-networks
Reference methods details
Detailed data
Importance of imaging data

Figures (7)

Figure 1: The workflow of our active sensing method, which performs decision-making under uncertainty. The state at each time point is associated with the data acquired at the latest patient visit. The reward function is designed to maximize the efficiency of hospital visits by taking into account radiographic changes and hospital visit costs. The set of actions comprises two elements -- follow-up at time $t$ or skip. DNN = deep neural network; KOA = knee osteoarthritis; BMI = body mass index; SF12 = 12-item Short Form Survey; WOMAC = the Western Ontario and McMaster Universities Osteoarthritis Index.
Figure 2: Ablation study of parameters $\alpha$ and $\beta$, evaluated using ratio of recall over acquisition cost and balanced accuracy. The optimal values of $\alpha$ and $\beta$ are chosen based on these metrics.
Figure 3: Convergence of model training. We adjusted the balance between exploration and exploitation by decaying $\varepsilon$ over epochs with different discount factors $\gamma$. The dashed line indicates the decay of $\varepsilon$ (left y-axis). The color lines indicate the improvement of RPP over 4 years under RL-based policy (right y-axis). Colors correspond to different discount factors $\gamma \in \{0.75,0.85,1.0\}$. The policy trained with $\gamma=1.0$ was the most stable and achieved the highest reward over training.
Figure 4: The reward-per-person (RPP) of "no sensing" (NOS), annual sensing (ANS), CLIMATv2, and our policy ("Ours"). Subgifure (a) shows how RPPs change with the hospital visit cost varying from $0 to $2,000. Subfigure (b) shows RPP changes with the varying cost of 1mm fJSW degeneration from $500 to $2,000, which also corresponds to the increase of TKR cost. Subfigure (c) shows the relationship between the cost ratio ($\lambda/c$) and fJSW reward ($r/c$) for our RL-based method. The parameters were adjusted in both training and testing (i.e., deployment) subsets. We found that the ratio $\lambda/c$ of $0.3$ is a threshold for a cost-efficiency policy (highlighted in red).
Figure 5: Performance metrics in knee-level and patient-level approaches at each follow-up year after the baseline visit. The error bars represent standard errors over 10 runs with random seeds.
...and 2 more figures

Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

TL;DR

Abstract

Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)