Add and Thin: Diffusion for Temporal Point Processes

David Lüdke; Marin Biloš; Oleksandr Shchur; Marten Lienen; Stephan Günnemann

Add and Thin: Diffusion for Temporal Point Processes

David Lüdke, Marin Biloš, Oleksandr Shchur, Marten Lienen, Stephan Günnemann

TL;DR

ADD-THIN introduces a diffusion-style framework for temporal point processes that operates on entire event sequences rather than performing autoregressive sampling. The forward process thins points and adds homogeneous-Poisson noise via $\lambda_n(t)=\alpha_n\lambda_{n-1}(t)+(1-\alpha_n)\lambda_{\mathrm{HPP}}$, and a learned reverse posterior $\lambda_{n-1}(t|\mathbf{t}^{(0)},\mathbf{t}^{(n)})$ enables denoising. The model uses sequence embedding and a classifier to approximate posterior components and optimizes a joint $\mathcal{L}_{\mathrm{NLL}}+\mathcal{L}_{\mathrm{BCE}}$, equivalent to an ELBO. Empirically, ADD-THIN matches state-of-the-art density estimation and significantly outperforms autoregressive baselines in long-horizon forecasting, with near-constant sampling times and strong performance on real-world data, illustrating the practical impact of diffusion-based sequence modeling for TPPs.

Abstract

Autoregressive neural networks within the temporal point process (TPP) framework have become the standard for modeling continuous-time event data. Even though these models can expressively capture event sequences in a one-step-ahead fashion, they are inherently limited for long-term forecasting applications due to the accumulation of errors caused by their sequential nature. To overcome these limitations, we derive ADD-THIN, a principled probabilistic denoising diffusion model for TPPs that operates on entire event sequences. Unlike existing diffusion approaches, ADD-THIN naturally handles data with discrete and continuous components. In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.

Add and Thin: Diffusion for Temporal Point Processes

TL;DR

, and a learned reverse posterior

enables denoising. The model uses sequence embedding and a classifier to approximate posterior components and optimizes a joint

, equivalent to an ELBO. Empirically, ADD-THIN matches state-of-the-art density estimation and significantly outperforms autoregressive baselines in long-horizon forecasting, with near-constant sampling times and strong performance on real-world data, illustrating the practical impact of diffusion-based sequence modeling for TPPs.

Abstract

Paper Structure (47 sections, 1 theorem, 14 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 47 sections, 1 theorem, 14 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Background
Temporal point processes (TPPs)
Poisson process.
Conditional intensity.
Denoising diffusion probabilistic models
Add-Thin
Forward process -- Noising
Reverse process -- Denoising
Parametrization and training
Sequence embedding.
Posterior approximation.
Training objective.
Sampling
Conditional sampling
...and 32 more sections

Key Result

Proposition 1

Given two independent random variables $X_1\sim Poisson(\lambda_1)$, $X_2\sim Poisson(\lambda_2)$, $X_1\mid X_1+X_2=k$ is Binomial distributed, i.e., $X_1\mid X_1+X_2=k\sim Binomial(x_1; k, \frac{\lambda_1}{\lambda_1 + \lambda_1})$.

Figures (6)

Figure 1: Proposed noising and denoising process for Add-Thin. (Left) Going from step $n-1$ to step $n$, we add and remove some points at random. (Right) Given ${\bm{t}}^{(n)}$ and ${\bm{t}}^{(0)}$ we know the intensity of points at step $n-1$. We approximate this intensity with our model, which enables sampling new sequences.
Figure 2: (Left) Illustration of all possible disjoint sets that we can reach in our forward process going from ${\bm{t}}^{(0)}$ to ${\bm{t}}^{(n)}$ through ${\bm{t}}^{(n-1)}$. (Right) Posterior intensity describing the distribution of ${\bm{t}}^{(n-1)} \mid {\bm{t}}^{(0)}, {\bm{t}}^{(n)}$, where each subset B-E can be generated by sampling from the intensity functions.
Figure 3: Architecture of our model predicting ${\bm{t}}_0$ from ${\bm{t}}_n$.
Figure 4: $5\%$, $25\%$, $50\%$, $75\%$, and $95\%$ quantile of forecasts generated by Add-Thin for a Taxi event sequence (blue: history, black ground truth future).
Figure 5: Sampling runtime for a batch of 100 event sequences averaged over 100 runs. We report the trained model's sampling times for the real-world datasets with different sequence lengths (from left to right: Twitter, Yelp 1, Yelp 2, PUBG, Taxi, Reddit-C, Reddit-A).
...and 1 more figures

Theorems & Definitions (4)

proof
proof
Proposition
proof

Add and Thin: Diffusion for Temporal Point Processes

TL;DR

Abstract

Add and Thin: Diffusion for Temporal Point Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)