Decomposable Transformer Point Processes

Aristeidis Panos

Decomposable Transformer Point Processes

Aristeidis Panos

TL;DR

A framework where the advantages of the attention-based architecture are maintained and the limitation of the thinning algorithm is circumvented is proposed, which attains state-of-the-art performance in predicting the next event of a sequence given its history.

Abstract

The standard paradigm of modeling marked point processes is by parameterizing the intensity function using an attention-based (Transformer-style) architecture. Despite the flexibility of these methods, their inference is based on the computationally intensive thinning algorithm. In this work, we propose a framework where the advantages of the attention-based architecture are maintained and the limitation of the thinning algorithm is circumvented. The framework depends on modeling the conditional distribution of inter-event times with a mixture of log-normals satisfying a Markov property and the conditional probability mass function for the marks with a Transformer-based architecture. The proposed method attains state-of-the-art performance in predicting the next event of a sequence given its history. The experiments also reveal the efficacy of the methods that do not rely on the thinning algorithm during inference over the ones they do. Finally, we test our method on the challenging long-horizon prediction task and find that it outperforms a baseline developed specifically for tackling this task; importantly, inference requires just a fraction of time compared to the thinning-based baseline.

Decomposable Transformer Point Processes

TL;DR

Abstract

Paper Structure (23 sections, 11 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 23 sections, 11 equations, 4 figures, 6 tables, 2 algorithms.

Introduction
Background
Decomposable Transformer Point Processes
Distribution of Marks
Distribution of Inter-Event Times
Training and Prediction
Related Work
Experiments
Goodness-of-Fit / Next-Event Prediction
Goodness-of-Fit.
Next-Event Prediction.
Synthetic datasets.
Long-Horizon Prediction
Discussion
Limitations and future work.
...and 8 more sections

Figures (4)

Figure 1: Goodness-of-fit evaluation over the five real-world datasets. We compare our DTPP model against five strong baselines. Results (larger is better) are accompanied by 95% bootstrap confidence intervals.
Figure 2: Performance comparison between DTPP and A-NHP over the SAHP-Synthetic dataset.
Figure 3: Performance comparison over the three real-world datasets measured by RMSE$^\star$ and average OTD (lower is better). The reported results for HYPRO are based on 16 weighted samples, i.e. $M=16$ for Algorithm 2 in xue2022hypro.
Figure 4: Goodness-of-fit and next-time prediction comparison over the two 1-d synthetic examples generated from a Hawkes process. The reported results are based on the test dataset. The black dotted line represents the true log-likelihood of the data (in nats).

Decomposable Transformer Point Processes

TL;DR

Abstract

Decomposable Transformer Point Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (4)