Add and Thin: Diffusion for Temporal Point Processes
David Lüdke, Marin Biloš, Oleksandr Shchur, Marten Lienen, Stephan Günnemann
TL;DR
ADD-THIN introduces a diffusion-style framework for temporal point processes that operates on entire event sequences rather than performing autoregressive sampling. The forward process thins points and adds homogeneous-Poisson noise via $\lambda_n(t)=\alpha_n\lambda_{n-1}(t)+(1-\alpha_n)\lambda_{\mathrm{HPP}}$, and a learned reverse posterior $\lambda_{n-1}(t|\mathbf{t}^{(0)},\mathbf{t}^{(n)})$ enables denoising. The model uses sequence embedding and a classifier to approximate posterior components and optimizes a joint $\mathcal{L}_{\mathrm{NLL}}+\mathcal{L}_{\mathrm{BCE}}$, equivalent to an ELBO. Empirically, ADD-THIN matches state-of-the-art density estimation and significantly outperforms autoregressive baselines in long-horizon forecasting, with near-constant sampling times and strong performance on real-world data, illustrating the practical impact of diffusion-based sequence modeling for TPPs.
Abstract
Autoregressive neural networks within the temporal point process (TPP) framework have become the standard for modeling continuous-time event data. Even though these models can expressively capture event sequences in a one-step-ahead fashion, they are inherently limited for long-term forecasting applications due to the accumulation of errors caused by their sequential nature. To overcome these limitations, we derive ADD-THIN, a principled probabilistic denoising diffusion model for TPPs that operates on entire event sequences. Unlike existing diffusion approaches, ADD-THIN naturally handles data with discrete and continuous components. In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.
