RoTHP: Rotary Position Embedding-based Transformer Hawkes Process
Anningzhe Gao, Shan Dai
TL;DR
RoTHP tackles timestamp noise sensitivity and sequence-prediction challenges in Transformer Hawkes Processes by introducing Rotary Temporal Positional Encoding. The model enforces translation invariance through relative time embeddings, improving generalization to timestamp translations and varying sequence lengths. Empirical results on synthetic and diverse real-world datasets show RoTHP outperforming RMTPP, NHP, SAHP, and THP in log-likelihood, accuracy, and RMSE, with added robustness to timestamp perturbations and future-prediction tasks. This work enhances neural Hawkes processes by providing a stable, scalable encoding that better handles noisy temporal data and long sequences.
Abstract
Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks.
