Table of Contents
Fetching ...

Amortized In-Context Mixed Effect Transformer Models: A Zero-Shot Approach for Pharmacokinetics

César Ali Ojeda Marin, Wilhelm Huisinga, Purity Kavwele, Ramsés J. Sánchez, Niklas Hartung

TL;DR

This work tackles the challenge of sparse, longitudinal pharmacokinetic data by introducing AICMET, a transformer-based latent-variable framework that blends mechanistic compartmental priors with amortized in-context Bayesian inference. By pretraining on large synthetic PK trajectories with Ornstein–Uhlenbeck priors, AICMET achieves zero-shot adaptation to new compounds and provides calibrated, patient-specific predictions after only a few early measurements. The model combines global population codes with individual-specific latent factors and uses a time-aware transformer decoder to handle irregular sampling and dosing information, delivering both accurate forecasts and meaningful uncertainty quantification. Empirical results on PK-DB demonstrate state-of-the-art accuracy and robust inter-patient variability capture, highlighting the potential of population-aware, mechanistically grounded neural architectures for truly personalized pharmacotherapy.

Abstract

Accurate dose-response forecasting under sparse sampling is central to precision pharmacotherapy. We present the Amortized In-Context Mixed-Effect Transformer (AICMET) model, a transformer-based latent-variable framework that unifies mechanistic compartmental priors with amortized in-context Bayesian inference. AICMET is pre-trained on hundreds of thousands of synthetic pharmacokinetic trajectories with Ornstein-Uhlenbeck priors over the parameters of compartment models, endowing the model with strong inductive biases and enabling zero-shot adaptation to new compounds. At inference time, the decoder conditions on the collective context of previously profiled trial participants, generating calibrated posterior predictions for newly enrolled patients after a few early drug concentration measurements. This capability collapses traditional model-development cycles from weeks to hours while preserving some degree of expert modelling. Experiments across public datasets show that AICMET attains state-of-the-art predictive accuracy and faithfully quantifies inter-patient variability -- outperforming both nonlinear mixed-effects baselines and recent neural ODE variants. Our results highlight the feasibility of transformer-based, population-aware neural architectures as offering a new alternative for bespoke pharmacokinetic modeling pipelines, charting a path toward truly population-aware personalized dosing regimens.

Amortized In-Context Mixed Effect Transformer Models: A Zero-Shot Approach for Pharmacokinetics

TL;DR

This work tackles the challenge of sparse, longitudinal pharmacokinetic data by introducing AICMET, a transformer-based latent-variable framework that blends mechanistic compartmental priors with amortized in-context Bayesian inference. By pretraining on large synthetic PK trajectories with Ornstein–Uhlenbeck priors, AICMET achieves zero-shot adaptation to new compounds and provides calibrated, patient-specific predictions after only a few early measurements. The model combines global population codes with individual-specific latent factors and uses a time-aware transformer decoder to handle irregular sampling and dosing information, delivering both accurate forecasts and meaningful uncertainty quantification. Empirical results on PK-DB demonstrate state-of-the-art accuracy and robust inter-patient variability capture, highlighting the potential of population-aware, mechanistically grounded neural architectures for truly personalized pharmacotherapy.

Abstract

Accurate dose-response forecasting under sparse sampling is central to precision pharmacotherapy. We present the Amortized In-Context Mixed-Effect Transformer (AICMET) model, a transformer-based latent-variable framework that unifies mechanistic compartmental priors with amortized in-context Bayesian inference. AICMET is pre-trained on hundreds of thousands of synthetic pharmacokinetic trajectories with Ornstein-Uhlenbeck priors over the parameters of compartment models, endowing the model with strong inductive biases and enabling zero-shot adaptation to new compounds. At inference time, the decoder conditions on the collective context of previously profiled trial participants, generating calibrated posterior predictions for newly enrolled patients after a few early drug concentration measurements. This capability collapses traditional model-development cycles from weeks to hours while preserving some degree of expert modelling. Experiments across public datasets show that AICMET attains state-of-the-art predictive accuracy and faithfully quantifies inter-patient variability -- outperforming both nonlinear mixed-effects baselines and recent neural ODE variants. Our results highlight the feasibility of transformer-based, population-aware neural architectures as offering a new alternative for bespoke pharmacokinetic modeling pipelines, charting a path toward truly population-aware personalized dosing regimens.

Paper Structure

This paper contains 31 sections, 21 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Hierarchical latent structure assumed in our AICMET model. Shaded nodes are observed. All latent representations ($z_i,z_s,z_n$) are continuous; solid blue arrows indicate conditional dependencies (decoder), orange dashed arrows indicate the recognition network (encoder).
  • Figure 2: The encoder produces dynamic representations with a recurrent backbone, and attention mechanisms are applied to summarize these representations at both the individual and study levels. Our transformer-based decoder embeds the encoder representations alongside dose information. Finally, we define functional queries that allow us to evaluate the predictive distribution at any target time $\tau$. By introducing $\mathbf{z}_i$ and $\mathbf{z}_s$, we can model fixed and random effects, enabling a population-aware, individualized characterization of dynamics.
  • Figure 3: Predictive plots---(a) dextromethorphan, (b) caffeine, (d) rosuvastatin (e) 4-hydroxytolbutamide---and visual predictive checks---(c) caffeine and (f) 4-hydroxytolbutamide---for different compounds. Each subplot shows observed concentrations and simulation-derived prediction intervals.