RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Anningzhe Gao; Shan Dai

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Anningzhe Gao, Shan Dai

TL;DR

RoTHP tackles timestamp noise sensitivity and sequence-prediction challenges in Transformer Hawkes Processes by introducing Rotary Temporal Positional Encoding. The model enforces translation invariance through relative time embeddings, improving generalization to timestamp translations and varying sequence lengths. Empirical results on synthetic and diverse real-world datasets show RoTHP outperforming RMTPP, NHP, SAHP, and THP in log-likelihood, accuracy, and RMSE, with added robustness to timestamp perturbations and future-prediction tasks. This work enhances neural Hawkes processes by providing a stable, scalable encoding that better handles noisy temporal data and long sequences.

Abstract

Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks.

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

TL;DR

Abstract

Paper Structure (31 sections, 2 theorems, 32 equations, 2 figures, 4 tables)

This paper contains 31 sections, 2 theorems, 32 equations, 2 figures, 4 tables.

Introduction
Position Embedding in self-attention hawkes process and Discussions
Existing position embeddings in self-attention hawkes process
Sinusoid embedding in THP
Time-shifted positional encoding in Self-Attentive Hawkes Process
Discussions
Timestamp noise sensitivity
Sequence prediction issue
Proposed Model
Rotary Position Embedding-based transformer Hawkes Process
Model architecture
Training
Translation Invariance Property
Sequence Prediction Flexibility
Experiment
...and 16 more sections

Key Result

Proposition 1

Let $\mathcal{H}=\left\{t_i \in \mathbb{R}^{+} \mid i \in \mathbb{N}^{+}, t_i<t_{i+1}\right\}$ be a Hawkes process with conditional intensity $\lambda^*(t)$ as defined in eq4. If we observe all the arrival times over the time period $[t_{1}, t_{n}]$, denoted as ${t_{1},...,t_{n}}$, then the log-like which is a function of timestamp differences $t_i-t_j, i,j \in \{1,2,...,n\}$ and $i \neq j$. Namel

Figures (2)

Figure 1: Comparison of log-likelihood between THP and RoTHP. Green lines are RoTHP, red lines are THP. The left figure represent the training process of financial dataset, the middle figure is for the synthetic dataset and the right is for the stackoverflow dataset. We can see in both figures RoTHP outperforms THP
Figure 2: Green lines are RoTHP, red lines are THP. The experiment is for the financial transaction dataset

Theorems & Definitions (4)

Proposition 1
proof
Proposition 2
proof

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

TL;DR

Abstract

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (4)