Table of Contents
Fetching ...

TimeTox: An LLM-Based Pipeline for Automated Extraction of Time Toxicity from Clinical Trial Protocols

Saketh Vinjamuri, Marielle Fis Loperena, Marie C. Spezia, Ramez Kouzy

Abstract

Time toxicity, the cumulative healthcare contact days from clinical trial participation, is an important but labor-intensive metric to extract from protocol documents. We developed TimeTox, an LLM-based pipeline for automated extraction of time toxicity from Schedule of Assessments tables. TimeTox uses Google's Gemini models in three stages: summary extraction from full-length protocol PDFs, time toxicity quantification at six cumulative timepoints for each treatment arm, and multi-run consensus via position-based arm matching. We validated against 20 synthetic schedules (240 comparisons) and assessed reproducibility on 644 real-world oncology protocols. Two architectures were compared: single-pass (vanilla) and two-stage (structure-then-count). The two-stage pipeline achieved 100% clinically acceptable accuracy ($\pm$3 days) on synthetic data (MAE 0.81 days) versus 41.5% for vanilla (MAE 9.0 days). However, on real-world protocols, the vanilla pipeline showed superior reproducibility: 95.3% clinically acceptable accuracy (IQR $\leq$ 3 days) across 3 runs on 644 protocols, with 82.0% perfect stability (IQR = 0). The production pipeline extracted time toxicity for 1,288 treatment arms across multiple disease sites. Extraction stability on real-world data, rather than accuracy on synthetic benchmarks, is the decisive factor for production LLM deployment.

TimeTox: An LLM-Based Pipeline for Automated Extraction of Time Toxicity from Clinical Trial Protocols

Abstract

Time toxicity, the cumulative healthcare contact days from clinical trial participation, is an important but labor-intensive metric to extract from protocol documents. We developed TimeTox, an LLM-based pipeline for automated extraction of time toxicity from Schedule of Assessments tables. TimeTox uses Google's Gemini models in three stages: summary extraction from full-length protocol PDFs, time toxicity quantification at six cumulative timepoints for each treatment arm, and multi-run consensus via position-based arm matching. We validated against 20 synthetic schedules (240 comparisons) and assessed reproducibility on 644 real-world oncology protocols. Two architectures were compared: single-pass (vanilla) and two-stage (structure-then-count). The two-stage pipeline achieved 100% clinically acceptable accuracy (3 days) on synthetic data (MAE 0.81 days) versus 41.5% for vanilla (MAE 9.0 days). However, on real-world protocols, the vanilla pipeline showed superior reproducibility: 95.3% clinically acceptable accuracy (IQR 3 days) across 3 runs on 644 protocols, with 82.0% perfect stability (IQR = 0). The production pipeline extracted time toxicity for 1,288 treatment arms across multiple disease sites. Extraction stability on real-world data, rather than accuracy on synthetic benchmarks, is the decisive factor for production LLM deployment.
Paper Structure (32 sections, 5 figures, 7 tables)

This paper contains 32 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Representative SoA table from a complex synthetic breast oncology protocol (BRST-2025-01) showing two treatment arms with three visit days per cycle.
  • Figure 2: Step-by-step pipeline for processing protocol PDFs via the Gemini API to extract relevant schedules and generate a consolidated summary document.
  • Figure 3: Vanilla pipeline signed error distribution by time window on 20 synthetic schedules (240 comparisons). Each dot represents one arm-timepoint comparison. Red: overcount $>$3 days; blue: undercount $>$3 days; green: within $\pm$3 days. Solid line: mean error; dashed lines: mean $\pm$ 1.96 SD (limits of agreement).
  • Figure 4: Mean absolute error (days) by schedule complexity and time window for the vanilla pipeline on synthetic data. Errors are lowest at screening and increase with both time window duration and schedule complexity.
  • Figure 5: TimeTox Production Pipeline Architecture.