Table of Contents
Fetching ...

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng, Florence Regol, Mark Coates

TL;DR

CDiff introduces a diffusion-based framework for event sequence forecasting that jointly models inter-arrival times and event types via two interacting diffusion processes. By applying a Box-Cox transform to times and learning a cross-diffusion reverse process, the model directly samples entire future sequences conditioned on history, mitigating error accumulation common in autoregressive TPP methods. Empirical results across six real-world datasets and a synthetic Hawkes dataset demonstrate superior long-horizon forecasting, better modeling of complex time distributions, and competitive sampling efficiency, with ablations validating the necessity of joint modeling. The approach advances temporal point process forecasting by leveraging high-dimensional generative modeling to capture intricate dependencies between time and type, providing a practical tool for multi-event sequence prediction with strong performance gains.

Abstract

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. This allows us to fully leverage the high dimensional modeling capability of modern generative models. Our model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.

Interacting Diffusion Processes for Event Sequence Forecasting

TL;DR

CDiff introduces a diffusion-based framework for event sequence forecasting that jointly models inter-arrival times and event types via two interacting diffusion processes. By applying a Box-Cox transform to times and learning a cross-diffusion reverse process, the model directly samples entire future sequences conditioned on history, mitigating error accumulation common in autoregressive TPP methods. Empirical results across six real-world datasets and a synthetic Hawkes dataset demonstrate superior long-horizon forecasting, better modeling of complex time distributions, and competitive sampling efficiency, with ablations validating the necessity of joint modeling. The approach advances temporal point process forecasting by leveraging high-dimensional generative modeling to capture intricate dependencies between time and type, providing a practical tool for multi-event sequence prediction with strong performance gains.

Abstract

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. This allows us to fully leverage the high dimensional modeling capability of modern generative models. Our model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.
Paper Structure (41 sections, 19 equations, 11 figures, 13 tables)

This paper contains 41 sections, 19 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Visualization of the cross-diffusion generating process for 15 example Stackoverflow sequences. The colors indicates the different categories. We start by generating noisy sequences ($t=T$). Once we reach the end of the denoising process ($t=0$), we recover sequences similar to ground truth sequences.
  • Figure 2: Architectural overview of our model CDiff. We employ two interacting denoising diffusion processes, one categorical and one real-valued, to model the high-dimensional event sequences. The neural networks modeling the reverse diffusion steps interact, allowing them to learn dependencies between event types and interarrival times. Generating an entire sequence at once avoids the error propagation that can plague autoregressive models.
  • Figure 3: Left) Stacked column chart of ranks of the algorithms across the 5 datasets for all the metrics. We collect the rank for each metric ($9$ metrics in total, as we include additional metrics from the interval forecasting experiment described in the Appendix \ref{['sec:interval_forecasting']}). The x-axis is the rank, and the y-axis is the proportion adding up to 1. Middle) Stacked column chart of ranks only for time-related metrics ($\textbf{RMSE}_{x^+}$, $\textbf{MAPE}$, $\textbf{sMAPE}$, $\textbf{RMSE}_{|\mathbf{s}^+|}$, $\textbf{MAE}_{|\mathbf{s}^+|}$). Right) Stacked column chart of ranks only for type-related metric ($\textbf{RMSE}_{e}$).
  • Figure 4: Histogram of true and predicted inter-arrival times for the Taobao dataset. Note that the bin widths gradually increase to make visual comparison easier.
  • Figure 5: Histogram of true and predicted inter-arrival times for cases when the next event is type $e{=}7$ (top) and $e{=}16$ (bottom) for the Taobao dataset. Bin widths gradually increase so that counts are more comparable.
  • ...and 6 more figures