Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

Jia Li; Yu Hou; Rui Zhang

Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

Jia Li, Yu Hou, Rui Zhang

TL;DR

LITT is introduced, a novel Timing-Transformer architecture that enables temporary alignment of sequential events on a virtual ``relative timeline'', thereby enabling event-timing-focused attention and personalized interpretations of clinical trajectories and positioning it as a significant step forward for precision medicine in clinical AI.

Abstract

Automatically discovering personalized sequential events from large-scale time-series data is crucial for enabling precision medicine in clinical research, yet it remains a formidable challenge even for contemporary AI models. For example, while transformers capture rich associations, they are mostly agnostic to event timing and ordering, thereby bypassing potential causal reasoning. Intuitively, we need a method capable of evaluating the "degree of alignment" among patient-specific trajectories and identifying their shared patterns, i.e., the significant events in a consistent sequence. This necessitates treating timing as a true \emph{computable} dimension, allowing models to assign ``relative timestamps'' to candidate events beyond their observed physical times. In this work, we introduce LITT, a novel Timing-Transformer architecture that enables temporary alignment of sequential events on a virtual ``relative timeline'', thereby enabling \emph{event-timing-focused attention} and personalized interpretations of clinical trajectories. Its interpretability and effectiveness are validated on real-world longitudinal EHR data from 3,276 breast cancer patients to predict the onset timing of cardiotoxicity-induced heart disease. Furthermore, LITT outperforms both the benchmark and state-of-the-art survival analysis methods on public datasets, positioning it as a significant step forward for precision medicine in clinical AI.

Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 10 figures, 2 tables)

This paper contains 17 sections, 11 equations, 10 figures, 2 tables.

Introduction
Computation of Timing
Related Works
Why Attention for Event Timing
Relative Timing Transformation
Level-of-Individual Timing Transformer
Time-Transformation Gate
Conditional Timing Attention
Experiments
Experiment 1: Trajectory Discovery
Experiment 2: Event-Timing Regression
Experiment 3: Survival Analysis
Conclusion
Appendix A: Why LSTM Enables Timing Computation While GRU Cannot
Appendix B: Numerical Results in Figure 6
...and 2 more sections

Figures (10)

Figure 1: Example of timing transformation from absolute time to relative time, which can be represented by a sequence of scaling coefficients $\{\gamma_i\}$ with $i = 0, \dots, T$.
Figure 2: Unit architecture of the LITT model, where the absolute timestamp $t$ serves as a dedicated input to the Time-Transformation Gate. The gate's update is separated from the standard LSTM backbone, and its output is multiplied with the cell state.
Figure 3: Conditional event-timing attention updates using model-derived relative timestamps, resulting in the discovery of the most significant temporal trajectory pattern $E_2 \to E_3 \to E_4 \to \ldots$ shared across patients $\{p_j, p_{j+1}, p_{j+2}, \ldots\} =P$.
Figure 4: Association heatmaps (also considered as regular attention) under three initial treatment conditions: first-time radiation therapy, first-time chemotherapy, and first-time targeted therapy. Displayed events include first-time medication use (prefixed Md_) and first-time diagnoses (prefixed Dx_). See Appendix C for abbreviation information.
Figure 5: Two typical trajectories discovered by LITT in a purely data-driven manner from real-world EHR data. Each step displays the number of available patients (top) and the proportion of heart disease diagnoses (label = 1, bottom). Red numeric values indicate LITT-computed conditional event-timing attention given all preceding events. Event names are suffixed with 1, 2, or 3 to denote the first, second, or third administration of the procedure. The most significant routine is shown in red.
...and 5 more figures

Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

TL;DR

Abstract

Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

Authors

TL;DR

Abstract

Table of Contents

Figures (10)