Table of Contents
Fetching ...

Controllable Sequence Editing for Biological and Clinical Trajectories

Michelle M. Li, Kevin Li, Yasha Ektefaie, Ying Jin, Yepeng Huang, Shvat Messica, Tianxi Cai, Marinka Zitnik

TL;DR

Clef, a controllable sequence editing model for conditional generation of immediate and delayed effects in multivariate longitudinal sequences that learns temporal concepts that encode how and when a condition alters future sequence evolution, is introduced.

Abstract

Conditional generation models for longitudinal sequences can produce new or modified trajectories given a conditioning input. However, they often lack control over when the condition should take effect (timing) and which variables it should influence (scope). Most methods either operate only on univariate sequences or assume that the condition alters all variables and time steps. In scientific and clinical settings, interventions instead begin at a specific moment, such as the time of drug administration or surgery, and influence only a subset of measurements while the rest of the trajectory remains unchanged. CLEF learns temporal concepts that encode how and when a condition alters future sequence evolution. These concepts allow CLEF to apply targeted edits to the affected time steps and variables while preserving the rest of the sequence. We evaluate CLEF on 8 datasets spanning cellular reprogramming, patient health, and sales, comparing against 9 state-of-the-art baselines. CLEF improves immediate sequence editing accuracy by 16.28% (MAE) on average against their non-CLEF counterparts. Unlike prior models, CLEF enables one-step conditional generation at arbitrary future times, outperforming their non-CLEF counterparts in delayed sequence editing by 26.73% (MAE) on average. We test CLEF under counterfactual inference assumptions and show up to 62.84% (MAE) improvement on zero-shot conditional generation of counterfactual trajectories. In a case study of patients with type 1 diabetes mellitus, CLEF identifies clinical interventions that generate realistic counterfactual trajectories shifted toward healthier outcomes.

Controllable Sequence Editing for Biological and Clinical Trajectories

TL;DR

Clef, a controllable sequence editing model for conditional generation of immediate and delayed effects in multivariate longitudinal sequences that learns temporal concepts that encode how and when a condition alters future sequence evolution, is introduced.

Abstract

Conditional generation models for longitudinal sequences can produce new or modified trajectories given a conditioning input. However, they often lack control over when the condition should take effect (timing) and which variables it should influence (scope). Most methods either operate only on univariate sequences or assume that the condition alters all variables and time steps. In scientific and clinical settings, interventions instead begin at a specific moment, such as the time of drug administration or surgery, and influence only a subset of measurements while the rest of the trajectory remains unchanged. CLEF learns temporal concepts that encode how and when a condition alters future sequence evolution. These concepts allow CLEF to apply targeted edits to the affected time steps and variables while preserving the rest of the sequence. We evaluate CLEF on 8 datasets spanning cellular reprogramming, patient health, and sales, comparing against 9 state-of-the-art baselines. CLEF improves immediate sequence editing accuracy by 16.28% (MAE) on average against their non-CLEF counterparts. Unlike prior models, CLEF enables one-step conditional generation at arbitrary future times, outperforming their non-CLEF counterparts in delayed sequence editing by 26.73% (MAE) on average. We test CLEF under counterfactual inference assumptions and show up to 62.84% (MAE) improvement on zero-shot conditional generation of counterfactual trajectories. In a case study of patients with type 1 diabetes mellitus, CLEF identifies clinical interventions that generate realistic counterfactual trajectories shifted toward healthier outcomes.

Paper Structure

This paper contains 44 sections, 1 theorem, 5 equations, 18 figures, 16 tables.

Key Result

Corollary B.4

Assumptions assum:consistency-assum:ignorability provide sufficient identifiability conditions for Eq. eq:estimate (i.e., with G-computation li2021gnet). However, it requires estimating conditional distributions of time-varying covariates melnychuk2022causal. Since this could be challenging given a

Figures (18)

  • Figure 1: Illustrative comparison of (a) controllable sequence editing and (b) existing sequence editing. Unlike existing methods, controllable sequence editing generates sequences (dotted lines) guided by a condition while preserving historical data to model the effects of immediate (e.g., in 2 hours) or delayed (e.g., in 1 week) edits.
  • Figure 2: Overview of Clef's architecture and capabilities. (a) Given a sequence, forecast time, and condition embedding from a frozen pretrained (pt) embedding model, Clef generates a sequence via immediate or delayed sequence editing. (b)Clef is composed of a sequence encoder, condition adapter, concept encoder, and concept decoder. Clef has two key capabilities: (c) forecast sequences at any future time and under any condition (e.g., medical codes), and (d) generate sequences by intervening on Clef's learned temporal concepts.
  • Figure 3: Clef is evaluated on 7 datasets of (a) cellular development and (b) patient health trajectories. Illustrations from NIAID NIH BIOART.
  • Figure 4: Benchmarking Clef, baselines, and ablations on (a) immediate and (b) delayed sequence editing on observed sequences. Lower MAE is better. Models are trained on 3 seeds using a standard cell-, patient-, or store-centric random split; error bars show 95% CI. Not shown for visualization purposes are VAR's performance on eICU and MIMIC-IV: on immediate sequence editing, MAE for eICU and MIMIC-IV are $55982.74$ and $886.05$; on delayed sequence editing, MAE for eICU and MIMIC-IV are $3.02 \times 10^{39}$ and $8.62 \times 10^{23}$.
  • Figure 5: Counterfactual $\tau$-step ahead prediction on tumor growth (single-sliding treatment) with different amounts of time-varying confounding. Models are trained on 5 seeds; error bars show 95% CI.
  • ...and 13 more figures

Theorems & Definitions (4)

  • Definition 3.1: Sequence editing
  • Definition 3.2: Temporal concept
  • Definition 3.3: Controllable sequence editing
  • Corollary B.4: G-computation