Table of Contents
Fetching ...

Structure-Aware Set Transformers: Temporal and Variable-Type Attention Biases for Asynchronous Clinical Time Series

Joohyung Lee, Kwanhyung Lee, Changhun Kim, Eunho Yang

TL;DR

This work restores priors in STructure-AwaRe (STAR) Set Transformer by adding parameter-efficient soft attention biases: a temporal locality penalty with learnable timescales and a variable-type affinity from a learned feature-compatibility matrix.

Abstract

Electronic health records (EHR) are irregular, asynchronous multivariate time series. As time-series foundation models increasingly tokenize events rather than discretizing time, the input layout becomes a key design choice. Grids expose time$\times$variable structure but require imputation or missingness masks, risking error or sampling-policy shortcuts. Point-set tokenization avoids discretization but loses within-variable trajectories and time-local cross-variable context (Fig.1). We restore these priors in STructure-AwaRe (STAR) Set Transformer by adding parameter-efficient soft attention biases: a temporal locality penalty $-|Δt|/τ$ with learnable timescales and a variable-type affinity $B_{s_i,s_j}$ from a learned feature-compatibility matrix. We benchmark 10 depth-wise fusion schedules (Fig.2). On three ICU prediction tasks, STAR-Set achieves AUC/APR of 0.7158/0.0026 (CPR), 0.9164/0.2033 (mortality), and 0.8373/0.1258 (vasopressor use), outperforming regular-grid, event-time grid, and prior set baselines. Learned $τ$ and $B$ provide interpretable summaries of temporal context and variable interactions, offering a practical plug-in for context-informed time-series models.

Structure-Aware Set Transformers: Temporal and Variable-Type Attention Biases for Asynchronous Clinical Time Series

TL;DR

This work restores priors in STructure-AwaRe (STAR) Set Transformer by adding parameter-efficient soft attention biases: a temporal locality penalty with learnable timescales and a variable-type affinity from a learned feature-compatibility matrix.

Abstract

Electronic health records (EHR) are irregular, asynchronous multivariate time series. As time-series foundation models increasingly tokenize events rather than discretizing time, the input layout becomes a key design choice. Grids expose timevariable structure but require imputation or missingness masks, risking error or sampling-policy shortcuts. Point-set tokenization avoids discretization but loses within-variable trajectories and time-local cross-variable context (Fig.1). We restore these priors in STructure-AwaRe (STAR) Set Transformer by adding parameter-efficient soft attention biases: a temporal locality penalty with learnable timescales and a variable-type affinity from a learned feature-compatibility matrix. We benchmark 10 depth-wise fusion schedules (Fig.2). On three ICU prediction tasks, STAR-Set achieves AUC/APR of 0.7158/0.0026 (CPR), 0.9164/0.2033 (mortality), and 0.8373/0.1258 (vasopressor use), outperforming regular-grid, event-time grid, and prior set baselines. Learned and provide interpretable summaries of temporal context and variable interactions, offering a practical plug-in for context-informed time-series models.
Paper Structure (31 sections, 8 equations, 2 figures, 4 tables)

This paper contains 31 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: EHR input layouts and biasing set attention. (a) Irregular, asynchronous EHR events. Grid and sparse time$\times$variable layouts (b,c) make within-variable trajectories (red) and time-local cross-variable relations (blue) explicit (sparse relies on missingness masks), whereas set tokenization (d) obscures both axes. We restore these inductive priors with a variable-type bias (e), favoring same-variable interactions, and a temporal bias (f), favoring temporally proximal interactions.
  • Figure 2: Layer-wise fusion strategies for soft attention biases in the set encoder. Each panel illustrates a bias schedule applied across Transformer encoder layers (stacked blocks from early/lower to late/upper) on top of the set embedder. We ablate no bias (nb), temporal bias (tb), variable-type bias (vb), and their combination (vt). The shorthand "x--y" denotes using bias x in the lower layers and bias y in the upper layers: (a) nb--nb, (b) tb--tb, (c) vb--vb, (d) nb--tb, (e) tb--nb, (f) nb--vb, (g) vb--nb, (h) vb--tb, (i) tb--vb, and (j) vt--vt. We denote vt--vt as our proposed STAR Set Transformer.