Table of Contents
Fetching ...

EventFlow: Forecasting Temporal Point Processes with Flow Matching

Gavin Kerrigan, Kai Nelson, Padhraic Smyth

TL;DR

EventFlow tackles temporal point process forecasting with a non-autoregressive, flow-matching approach that directly learns the joint distribution over event times. It decomposes the task into a learned event-count model $p_\phi(n|\mathcal{H})$ and a flow-based time-generation model that transports a reference TPP $\mu_0$ to the data distribution $\mu_1$ via a flow with time parameter $s\in[0,1]$, using balanced couplings and interpolants $\gamma_s^z$ to train a vector field $v_\theta$. Sampling reduces to drawing from $\mu_0$ and solving the ODE $d\gamma_s = v_\theta(\gamma_s,s)\,ds$, enabling efficient generation with few forward passes. Empirically, EventFlow achieves a 20-53% reduction in multi-step forecasting error compared with strong baselines and delivers competitive unconditional generation across real and synthetic datasets, demonstrating the practicality of flow-based, non-autoregressive TPP modeling. The work broadens the toolkit for temporal point processes by offering a simple, scalable alternative to autoregressive and diffusion-based methods, with potential extensions to marked and spatiotemporal settings.

Abstract

Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can degrade when forecasting longer horizons due to cascading errors and myopic predictions. We propose EventFlow, a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is simple to implement and achieves a 20%-53% lower error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.

EventFlow: Forecasting Temporal Point Processes with Flow Matching

TL;DR

EventFlow tackles temporal point process forecasting with a non-autoregressive, flow-matching approach that directly learns the joint distribution over event times. It decomposes the task into a learned event-count model and a flow-based time-generation model that transports a reference TPP to the data distribution via a flow with time parameter , using balanced couplings and interpolants to train a vector field . Sampling reduces to drawing from and solving the ODE , enabling efficient generation with few forward passes. Empirically, EventFlow achieves a 20-53% reduction in multi-step forecasting error compared with strong baselines and delivers competitive unconditional generation across real and synthetic datasets, demonstrating the practicality of flow-based, non-autoregressive TPP modeling. The work broadens the toolkit for temporal point processes by offering a simple, scalable alternative to autoregressive and diffusion-based methods, with potential extensions to marked and spatiotemporal settings.

Abstract

Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can degrade when forecasting longer horizons due to cascading errors and myopic predictions. We propose EventFlow, a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is simple to implement and achieves a 20%-53% lower error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.

Paper Structure

This paper contains 38 sections, 2 theorems, 15 equations, 6 figures, 20 tables, 2 algorithms.

Key Result

Proposition 0

Let $\mu, \nu \in \mathbb{P}(\Gamma)$ be two TPPs. The set of balanced couplings $\Pi_b(\mu, \nu)$ is nonempty if and only if $\mu(n) = \nu(n)$ have the same distribution over event counts.

Figures (6)

  • Figure 1: All illustration of forecasting with our EventFlow method. The horizontal axis indicates the flow time $s$, and the vertical axis indicates the support of the TPP $\mathcal{T} = [0, T]$. We first encode the observed history $\mathcal{H}$ into an embedding $e_{\mathcal{H}} = f_\theta(\mathcal{H})$. At $s=0$, we independently draw $n$ events in the forecasting window $[T_0, T_0 + \Delta T]$ from a fixed reference distribution, constituting a sample $\gamma_0$ from a mixed-binomial TPP. Each event can be thought of as a particle, which is assigned a velocity by a neural network $v_\theta(\gamma_s, s, e_{\mathcal{H}})$. Each particle flows along its corresponding velocity field until reaching its terminal point at $s=1$, whereby we obtain a forecasted sequence $\gamma_1$.
  • Figure 2: Sequence distance \ref{['eqn:sequence_distance']} between the forecasted and ground-truth event sequences on a held-out test set. We report the mean $\pm$ one standard deviation over five random seeds. EventFlow (with 25 NFEs) achieves the lowest mean distance (forecasting error) for each of the 7 datasets.
  • Figure 3: Overview of our model architecture for unconditional generation. The model takes as input the flow time $s$ and current sequence state $\gamma_s = \sum_{k=1}^n \delta[t_s^k]$. Each input is projected to a fixed-length vector via a learnable embedding. The resulting embeddings are added and passed to the transformer model, which produces a sequence of output velocities $v_\theta(\gamma_s, s)$ with $N(\gamma_s)$ components.
  • Figure 4: Overview of our model architecture for conditional generation. The encoder (left) takes as input the observed history $\mathcal{H}$, which is embedded in a fashion analogous to our unconditional model. The decoder (right) takes as input the flow time $s$ and current state $\gamma_s = \sum_{k=1} \delta[t_s^k]$. These are embedded and passed through the decoder, which applies cross attention to produce the conditional velocities $v_\theta(\gamma_s, s, e_\mathcal{H})$.
  • Figure 5: Overview of our architecture modeling the event count distribution $p_\phi(n \mid \mathcal{H})$. The model takes as input an observed history $\mathcal{H}$. As in our other architectures, the events are embedded and passed through a transformer. Here, the final sequence embedding output by the transformer is averaged and passed through an additional residual MLP with three layers to produce the logit corresponding to $p_\phi(n \mid \mathcal{H})$.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Proposition 0: Existence of Balanced Couplings
  • Proposition 0: Existence of Balanced Couplings
  • proof