Table of Contents
Fetching ...

EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

Payal Chandak, Gregory Kondas, Isaac Kohane, Matthew McDermott

TL;DR

EveryQuery is introduced, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training over randomly sampled combinations of query tasks and patient contexts, and enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation.

Abstract

Foundation models pretrained on electronic health records (EHR) have demonstrated zero-shot clinical prediction capabilities by generating synthetic patient futures and aggregating statistics over sampled trajectories. However, this autoregressive inference procedure is computationally expensive, statistically noisy, and not natively promptable because users cannot directly condition predictions on specific clinical questions. In this preliminary work, we introduce EveryQuery, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training. Rather than generating future events, EveryQuery takes as input a patient's history and a structured query specifying a clinical task, and directly estimates the likelihood of the outcome occurring in the future window via a single forward pass. EveryQuery realizes this capability by pre-training over randomly sampled combinations of query tasks and patient contexts, directly training the model to produce correct answers to arbitrary input prompts. This enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation. On MIMIC-IV, EveryQuery outperforms an autoregressive baseline on 82% of 39 randomly sampled prediction tasks, with a mean AUC improvement of +0.16 (95% CI: [0.10,0.22]). This advantage remains consistent on tasks that were explicitly held out from the pre-training distribution. Further, EveryQuery's performance gains are most pronounced for rare clinical events, affirming and demonstrating a solution to the fundamental limitation of autoregressive inference for low-prevalence outcomes. However, at present, EveryQuery underperforms on tasks requiring disjunctive reasoning over multiple codes, such as 30-day readmission, exposing a concrete expressiveness limitation of the current query language.

EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

TL;DR

EveryQuery is introduced, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training over randomly sampled combinations of query tasks and patient contexts, and enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation.

Abstract

Foundation models pretrained on electronic health records (EHR) have demonstrated zero-shot clinical prediction capabilities by generating synthetic patient futures and aggregating statistics over sampled trajectories. However, this autoregressive inference procedure is computationally expensive, statistically noisy, and not natively promptable because users cannot directly condition predictions on specific clinical questions. In this preliminary work, we introduce EveryQuery, an EHR foundation model that achieves zero-shot inference through task-conditioned pre-training. Rather than generating future events, EveryQuery takes as input a patient's history and a structured query specifying a clinical task, and directly estimates the likelihood of the outcome occurring in the future window via a single forward pass. EveryQuery realizes this capability by pre-training over randomly sampled combinations of query tasks and patient contexts, directly training the model to produce correct answers to arbitrary input prompts. This enables zero-shot prediction for any task in the query space without finetuning, linear probing, or trajectory generation. On MIMIC-IV, EveryQuery outperforms an autoregressive baseline on 82% of 39 randomly sampled prediction tasks, with a mean AUC improvement of +0.16 (95% CI: [0.10,0.22]). This advantage remains consistent on tasks that were explicitly held out from the pre-training distribution. Further, EveryQuery's performance gains are most pronounced for rare clinical events, affirming and demonstrating a solution to the fundamental limitation of autoregressive inference for low-prevalence outcomes. However, at present, EveryQuery underperforms on tasks requiring disjunctive reasoning over multiple codes, such as 30-day readmission, exposing a concrete expressiveness limitation of the current query language.
Paper Structure (46 sections, 3 equations, 3 figures, 3 tables)

This paper contains 46 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of EveryQuery. Autoregressive EHR models (a) learn $p(x)$ and achieve zero-shot inference by generating many synthetic futures and aggregating statistics; this yields quantized, high-variance estimates which is especially problematic for rare events. EveryQuery (b) learns $p(y \mid x, q)$ directly: it conditions on a structured task query $q = (c, \Delta t)$ alongside the patient's history, producing a prediction via a single deterministic forward pass.
  • Figure 2: EveryQuery advantage vs. event prevalence. Each point represents a randomly sampled prediction task, colored by EveryQuery AUC; circles denote in-distribution tasks, crosses denote out-of-distribution (held-out code) tasks. The negative correlation between AUC difference and prevalence ($\rho = -0.32$, $p = 0.048$) is driven almost entirely by the AR model: AR AUC is strongly tied to prevalence ($\rho = 0.64$, $p < 10^{-4}$) while EQ AUC is not ($\rho = 0.18$, $p = 0.28$).
  • Figure 3: EveryQuery embeddings are organized by query, not by patient. (a) UMAP shows that the query is the dominant organizing axis of the representation space. (b) Cosine similarity confirms that sharing a query matters more than sharing a patient.