EasyTPP: Towards Open Benchmarking Temporal Point Processes

Siqiao Xue; Xiaoming Shi; Zhixuan Chu; Yan Wang; Hongyan Hao; Fan Zhou; Caigao Jiang; Chen Pan; James Y. Zhang; Qingsong Wen; Jun Zhou; Hongyuan Mei

EasyTPP: Towards Open Benchmarking Temporal Point Processes

Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Hongyan Hao, Fan Zhou, Caigao Jiang, Chen Pan, James Y. Zhang, Qingsong Wen, Jun Zhou, Hongyuan Mei

TL;DR

EasyTPP introduces a centralized, open benchmarking platform for temporal point processes, standardizing datasets, evaluation protocols, and model implementations to accelerate reproducible research. It provides a modular library compatible with PyTorch and TensorFlow, enabling rapid assembly, training, sampling, and evaluation of neural TPPs via a unified pipeline. The benchmark evaluates nine models across multiple real and synthetic datasets, highlighting that performance varies by task and dataset, with attention-based methods often excelling in real-world settings and log-likelihood-based methods like IFTPP performing strongly in likelihood evaluation. The authors discuss future directions, including foundation models for event sequences, integration of external data sources, and intervention-based learning to capture causal dynamics, with the platform poised to catalyze progress in both theory and applications of event sequence modeling.

Abstract

Continuous-time event sequences play a vital role in real-world domains such as healthcare, finance, online shopping, social networks, and so on. To model such data, temporal point processes (TPPs) have emerged as the most natural and competitive models, making a significant impact in both academic and application communities. Despite the emergence of many powerful models in recent years, there hasn't been a central benchmark for these models and future research endeavors. This lack of standardization impedes researchers and practitioners from comparing methods and reproducing results, potentially slowing down progress in this field. In this paper, we present EasyTPP, the first central repository of research assets (e.g., data, models, evaluation programs, documentations) in the area of event sequence modeling. Our EasyTPP makes several unique contributions to this area: a unified interface of using existing datasets and adding new datasets; a wide range of evaluation programs that are easy to use and extend as well as facilitate reproducible research; implementations of popular neural TPPs, together with a rich library of modules by composing which one could quickly build complex models. All the data and implementation can be found at https://github.com/ant-research/EasyTemporalPointProcess. We will actively maintain this benchmark and welcome contributions from other researchers and practitioners. Our benchmark will help promote reproducible research in this field, thus accelerating research progress as well as making more significant real-world impacts.

EasyTPP: Towards Open Benchmarking Temporal Point Processes

TL;DR

Abstract

Paper Structure (40 sections, 3 equations, 8 figures, 7 tables)

This paper contains 40 sections, 3 equations, 8 figures, 7 tables.

Introduction
Background
Definition.
Neural TPPs.
Learning TPPs.
The Benchmarking Pipeline
Data Preprocessing.
Model Implementation.
Training.
Sampling.
Hyperparameter Tuning.
Software Interface
Experimental Evaluation
Experimental Setup
Results and Analysis
...and 25 more sections

Figures (8)

Figure 1: ArXiv submissions over time on TPPs. See \ref{['app:cite']} for details.
Figure 2: Drawing an event stream from a neural TPP. The model reads the sequence of past events (polygons) to arrive at a hidden state (blue). That state determines the future "intensities" of the two types of events--that is, their time-varying instantaneous probabilities. The intensity functions are continuous parametric curves (solid lines) determined by the most recent model state. Events will update the future intensity curves as they occur.
Figure 3: An open benchmarking pipeline using EasyTPP.
Figure 4: Performance of all the methods on the goodness-of-fit task on synthetic Hawkes, Retweet, and Taxi data. A higher score is better. All methods are implemented in PyTorch.
Figure 5: Long horizon prediction on Retweet data: left (avg prediction horizon $5$ events) vs. right (avg prediction horizon $10$ events).
...and 3 more figures

EasyTPP: Towards Open Benchmarking Temporal Point Processes

TL;DR

Abstract

EasyTPP: Towards Open Benchmarking Temporal Point Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (8)