Table of Contents
Fetching ...

METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking

Sayeri Lala, Niraj K. Jha

TL;DR

A framework called Measurement EfficienT Randomized Controlled Trials using Transformers with Input MasKing (METRIK), which, for the first time, calculates a PMD specific to the RCT from a modest amount of prior data, and increases the sampling efficiency of and imputation performance under the generated PMD by leveraging correlations over time and across metrics.

Abstract

Clinical randomized controlled trials (RCTs) collect hundreds of measurements spanning various metric types (e.g., laboratory tests, cognitive/motor assessments, etc.) across 100s-1000s of subjects to evaluate the effect of a treatment, but do so at the cost of significant trial expense. To reduce the number of measurements, trial protocols can be revised to remove metrics extraneous to the study's objective, but doing so requires additional human labor and limits the set of hypotheses that can be studied with the collected data. In contrast, a planned missing design (PMD) can reduce the amount of data collected without removing any metric by imputing the unsampled data. Standard PMDs randomly sample data to leverage statistical properties of imputation algorithms, but are ad hoc, hence suboptimal. Methods that learn PMDs produce more sample-efficient PMDs, but are not suitable for RCTs because they require ample prior data (150+ subjects) to model the data distribution. Therefore, we introduce a framework called Measurement EfficienT Randomized Controlled Trials using Transformers with Input MasKing (METRIK), which, for the first time, calculates a PMD specific to the RCT from a modest amount of prior data (e.g., 60 subjects). Specifically, METRIK models the PMD as a learnable input masking layer that is optimized with a state-of-the-art imputer based on the Transformer architecture. METRIK implements a novel sampling and selection algorithm to generate a PMD that satisfies the trial designer's objective, i.e., whether to maximize sampling efficiency or imputation performance for a given sampling budget. Evaluated across five real-world clinical RCT datasets, METRIK increases the sampling efficiency of and imputation performance under the generated PMD by leveraging correlations over time and across metrics, thereby removing the need to manually remove metrics from the RCT.

METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking

TL;DR

A framework called Measurement EfficienT Randomized Controlled Trials using Transformers with Input MasKing (METRIK), which, for the first time, calculates a PMD specific to the RCT from a modest amount of prior data, and increases the sampling efficiency of and imputation performance under the generated PMD by leveraging correlations over time and across metrics.

Abstract

Clinical randomized controlled trials (RCTs) collect hundreds of measurements spanning various metric types (e.g., laboratory tests, cognitive/motor assessments, etc.) across 100s-1000s of subjects to evaluate the effect of a treatment, but do so at the cost of significant trial expense. To reduce the number of measurements, trial protocols can be revised to remove metrics extraneous to the study's objective, but doing so requires additional human labor and limits the set of hypotheses that can be studied with the collected data. In contrast, a planned missing design (PMD) can reduce the amount of data collected without removing any metric by imputing the unsampled data. Standard PMDs randomly sample data to leverage statistical properties of imputation algorithms, but are ad hoc, hence suboptimal. Methods that learn PMDs produce more sample-efficient PMDs, but are not suitable for RCTs because they require ample prior data (150+ subjects) to model the data distribution. Therefore, we introduce a framework called Measurement EfficienT Randomized Controlled Trials using Transformers with Input MasKing (METRIK), which, for the first time, calculates a PMD specific to the RCT from a modest amount of prior data (e.g., 60 subjects). Specifically, METRIK models the PMD as a learnable input masking layer that is optimized with a state-of-the-art imputer based on the Transformer architecture. METRIK implements a novel sampling and selection algorithm to generate a PMD that satisfies the trial designer's objective, i.e., whether to maximize sampling efficiency or imputation performance for a given sampling budget. Evaluated across five real-world clinical RCT datasets, METRIK increases the sampling efficiency of and imputation performance under the generated PMD by leveraging correlations over time and across metrics, thereby removing the need to manually remove metrics from the RCT.

Paper Structure

This paper contains 30 sections, 2 equations, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: Flowchart of the METRIK framework.
  • Figure 2: Performance gains obtained by METRIK over baseline PMD algorithms for a setting that maximizes efficiency. For MF and MFL, PMDs are only feasible for efficiency levels $\geq$ 30%.
  • Figure 3: Performance gains obtained by METRIK over the RSD baseline under the two design objectives, i.e., maximize efficiency or maximize imputation performance. Results are only shown for baseline efficiencies ranging from 5-30% since METRIK neither improves nor hurts performance at higher baseline efficiencies.
  • Figure 4: Sample PMDs produced by RSD (top two rows) and a sample PMD generated by METRIK (bottom row) on the FSZONE dataset for a setting that maximizes efficiency.
  • Figure 5: Performance gains under METRIK and an ablated version that replaces the candidate pool of learned PMDs with random ones generated by the MF design. Gains are measured with respect to an MF-based PMD baseline for a setting that maximizes efficiency. Results are shown for 30% baseline efficiency, as the ablated method shows no gains at other efficiencies.
  • ...and 6 more figures