Table of Contents
Fetching ...

In-Context Learning of Temporal Point Processes with Foundation Inference Models

David Berghaus, Patrick Seifner, Kostadin Cvejoski, César Ojeda, Ramsés J. Sánchez

TL;DR

This work pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences, and shows that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

Abstract

Modeling event sequences of multiple event types with marked temporal point processes (MTPPs) provides a principled way to uncover governing dynamical rules and predict future events. Current neural network approaches to MTPP inference rely on training separate, specialized models for each target system. We pursue a radically different approach: drawing on amortized inference and in-context learning, we pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences. Pretraining is performed on a large synthetic dataset of MTPPs sampled from a broad distribution of Hawkes processes. Once pretrained, our Foundation Inference Model for Point Processes (FIM-PP) can estimate MTPPs from real-world data without any additional training, or be rapidly finetuned to target systems. Experiments show that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

In-Context Learning of Temporal Point Processes with Foundation Inference Models

TL;DR

This work pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences, and shows that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

Abstract

Modeling event sequences of multiple event types with marked temporal point processes (MTPPs) provides a principled way to uncover governing dynamical rules and predict future events. Current neural network approaches to MTPP inference rely on training separate, specialized models for each target system. We pursue a radically different approach: drawing on amortized inference and in-context learning, we pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences. Pretraining is performed on a large synthetic dataset of MTPPs sampled from a broad distribution of Hawkes processes. Once pretrained, our Foundation Inference Model for Point Processes (FIM-PP) can estimate MTPPs from real-world data without any additional training, or be rapidly finetuned to target systems. Experiments show that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

Paper Structure

This paper contains 40 sections, 22 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Schematic representation of FIM-PP. A context of marked event sequences $\mathcal{S}^j$ is encoded by a self-attentive transformer encoder. The result is further processed by a transformer decoder, using a history$\mathcal{H}_t$ of marked events before time $t$ as queries. The final embedding is joined with an encoding of mark$\kappa$. The results is projected to a set of parameters that determine the value of the conditional intensity function$\hat{\lambda}$ evaluated at $(t, \kappa)$.
  • Figure 2: Example intensity estimates of FIM-PP on a synthetic Hawkes process with three marks, constant base intensity and exponential decaying kernels (left) and a real-world Retweet dataset (right). Each row contains the intensity for one mark. Events of the same mark are colored magenta, while events for other marks are gray. For the Hawkes process, the model (blue line) estimate matches the ground-truth intensity level (black dashed line) closely. For the Retweet data, FIM-PP estimates a mixture of many excitatory and a few inhibitory interactions.
  • Figure 3: (a) shows that FIM-PP(zs) is competitive but slightly worse than the baseline models. FIM-PP(f) however performs best among all horizon lengths. (b) shows that FIM-PP(f) also reliably captures patterns in the Taxi dataset.
  • Figure 4: Intensity predictions of FIM-PP(zs) on synthetic datasets of four different process types. We remark that the model has not been trained on powerlaw kernels but still predicts them with decent accuracy.
  • Figure 5: Comparison of the fine-tuning loss curves of a pre-trained FIM-PP model versus random initialization. Note that one epoch corresponds to just one inference-path prediction and is therefore very fast. Our results indicate that the pre-training achieves faster convergence as well as a higher loglikelihood when converged.
  • ...and 9 more figures