Training and Validating a Treatment Recommender with Partial Verification Evidence

Vishnu Unnikrishnan; Clara Puga; Miro Schleicher; Uli Niemann; Berthod Langguth; Stefan Schoisswohl; Birgit Mazurek; Rilana Cima; Jose Antonio Lopez-Escamez; Dimitris Kikidis; Eleftheria Vellidou; Ruediger Pryss; Winfried Schlee; Myra Spiliopoulou

Training and Validating a Treatment Recommender with Partial Verification Evidence

Vishnu Unnikrishnan, Clara Puga, Miro Schleicher, Uli Niemann, Berthod Langguth, Stefan Schoisswohl, Birgit Mazurek, Rilana Cima, Jose Antonio Lopez-Escamez, Dimitris Kikidis, Eleftheria Vellidou, Ruediger Pryss, Winfried Schlee, Myra Spiliopoulou

TL;DR

This work presents TreatmentRecommender, a DSS that ranks treatments by predicted patient improvement using data from a multi-arm RCT where treatment assignment is random, creating missing rationale and verification. It introduces counterfactual treatment verification and a therapy-level ensemble to handle missing evidence and heterogeneous arm data, enabling learning and validation directly from the RCT. Applying the approach to a tinnitus UNITI trial, the authors show that alignment between the recommender’s top choices and actual RCT assignments correlates with higher rates of clinically meaningful THI improvements, supporting the feasibility of deploying DSSs for treatments yet to be clinically deployed. The study provides a principled framework for training and validating AI-driven treatment recommendations when only RCT data are available, and outlines future directions to incorporate confidence, patient matching, and synthetic data to bolster limited evidence.

Abstract

Current clinical decision support systems (DSS) are trained and validated on observational data from the target clinic. This is problematic for treatments validated in a randomized clinical trial (RCT), but not yet introduced in any clinic. In this work, we report on a method for training and validating the DSS using the RCT data. The key challenges we address are of missingness -- missing rationale for treatment assignment (the assignment is at random), and missing verification evidence, since the effectiveness of a treatment for a patient can only be verified (ground truth) for treatments what were actually assigned to a patient. We use data from a multi-armed RCT that investigated the effectiveness of single- and combination- treatments for 240+ tinnitus patients recruited and treated in 5 clinical centers. To deal with the 'missing rationale' challenge, we re-model the target variable (outcome) in order to suppress the effect of the randomly-assigned treatment, and control on the effect of treatment in general. Our methods are also robust to missing values in features and with a small number of patients per RCT arm. We deal with 'missing verification evidence' by using counterfactual treatment verification, which compares the effectiveness of the DSS recommendations to the effectiveness of the RCT assignments when they are aligned v/s not aligned. We demonstrate that our approach leverages the RCT data for learning and verification, by showing that the DSS suggests treatments that improve the outcome. The results are limited through the small number of patients per treatment; while our ensemble is designed to mitigate this effect, the predictive performance of the methods is affected by the smallness of the data. We provide a basis for the establishment of decision supporting routines on treatments that have been tested in RCTs but have not yet been deployed clinically.

Training and Validating a Treatment Recommender with Partial Verification Evidence

TL;DR

Abstract

Paper Structure (51 sections, 3 equations, 6 figures, 7 tables)

This paper contains 51 sections, 3 equations, 6 figures, 7 tables.

Introduction
A. Missing rationale for treatment assignment
B. Missing verification evidence
C. Missing evidence
Research Questions
Related Work
AI for clinical decision support
Validation of AI-based decision support
Internal validation and validation in interaction with experts
External validation
Materials
The RCT and its 'primary outcome measure'
RCT participants
The RCT features used in our analyses
Notational convention
...and 36 more sections

Figures (6)

Figure 1: Patient representation in a high-dimensional feature space encompassing single features, subscales and questionnaire scores
Figure 2: TreatmentRecommender for the ranking of treatments on improvement after adjusting the target variable and performing counterfactual treatment verfification: the left part of the figure refers to model learning on the RCT data, whereupon predictions per treatment are performed by an ensemble of therapy-level models; the right part delivers a recommendation for each patient, comprised of the treatments ranked on expected improvement (darker green colors are better); in the patient's record, white boxes refer to missing features
Figure 3: Filling the matrix of treatment outcome values for all treatments and all patients: the original matrix contains only one filled value per patient (left subfigure); the filled matrix contains $k-1$ counterfactual scores per patient (right subfigure)
Figure 4: Overview of the 10 RCT arms, where the treatment in an arm can is one component, e.g. CBT, or a pair of components, e.g. CBT$+$ST: as evidence for CBT we consider all arms where CBT was offered, and same for each of HA, SC and ST
Figure 5: Six plots on the THI score and its improvement (as value and as percentage) for aligned and non-aligned patients under binary alignment: the orange lines refer to aligned patients, the blue lines to non-aligned ones: all pairs of curves indicate that the distribution of scores, resp. improvements, is shifted more towards the 'better' numbers for the aligned patients than for the non-alined ones -- here 'better' refers to lower THI scores and larger improvement values, resp. improvement percentages
...and 1 more figures

Training and Validating a Treatment Recommender with Partial Verification Evidence

TL;DR

Abstract

Training and Validating a Treatment Recommender with Partial Verification Evidence

Authors

TL;DR

Abstract

Table of Contents

Figures (6)