Table of Contents
Fetching ...

Learning Structure-enhanced Temporal Point Processes with Gromov-Wasserstein Regularization

Qingmei Wang, Fanmeng Wang, Bing Su, Hongteng Xu

TL;DR

The paper addresses the interpretability gap in temporal point processes by enforcing clustering structure in sequence embeddings through a Gromov-Wasserstein (GW) regularizer added to the maximum likelihood objective. A nonparametric reference kernel is used as a clustering guide, while a scalable GW discrepancy between the embedding kernel $K(\theta)$ and the reference kernel $\widetilde{K}$ enables efficient training via sampling and mini-batch SGD. The method is model-agnostic, improving clustering quality (e.g., higher NMI/RI) without sacrificing predictive accuracy and reducing model complexity relative to mixture approaches. Empirical results on synthetic and real-world data demonstrate stronger interpretability through clustered embeddings and competitive performance on predictive tasks, making the approach suitable for large-scale, structure-aware sequential modeling.

Abstract

Real-world event sequences are often generated by different temporal point processes (TPPs) and thus have clustering structures. Nonetheless, in the modeling and prediction of event sequences, most existing TPPs ignore the inherent clustering structures of the event sequences, leading to the models with unsatisfactory interpretability. In this study, we learn structure-enhanced TPPs with the help of Gromov-Wasserstein (GW) regularization, which imposes clustering structures on the sequence-level embeddings of the TPPs in the maximum likelihood estimation framework.In the training phase, the proposed method leverages a nonparametric TPP kernel to regularize the similarity matrix derived based on the sequence embeddings. In large-scale applications, we sample the kernel matrix and implement the regularization as a Gromov-Wasserstein (GW) discrepancy term, which achieves a trade-off between regularity and computational efficiency.The TPPs learned through this method result in clustered sequence embeddings and demonstrate competitive predictive and clustering performance, significantly improving the model interpretability without compromising prediction accuracy.

Learning Structure-enhanced Temporal Point Processes with Gromov-Wasserstein Regularization

TL;DR

The paper addresses the interpretability gap in temporal point processes by enforcing clustering structure in sequence embeddings through a Gromov-Wasserstein (GW) regularizer added to the maximum likelihood objective. A nonparametric reference kernel is used as a clustering guide, while a scalable GW discrepancy between the embedding kernel and the reference kernel enables efficient training via sampling and mini-batch SGD. The method is model-agnostic, improving clustering quality (e.g., higher NMI/RI) without sacrificing predictive accuracy and reducing model complexity relative to mixture approaches. Empirical results on synthetic and real-world data demonstrate stronger interpretability through clustered embeddings and competitive performance on predictive tasks, making the approach suitable for large-scale, structure-aware sequential modeling.

Abstract

Real-world event sequences are often generated by different temporal point processes (TPPs) and thus have clustering structures. Nonetheless, in the modeling and prediction of event sequences, most existing TPPs ignore the inherent clustering structures of the event sequences, leading to the models with unsatisfactory interpretability. In this study, we learn structure-enhanced TPPs with the help of Gromov-Wasserstein (GW) regularization, which imposes clustering structures on the sequence-level embeddings of the TPPs in the maximum likelihood estimation framework.In the training phase, the proposed method leverages a nonparametric TPP kernel to regularize the similarity matrix derived based on the sequence embeddings. In large-scale applications, we sample the kernel matrix and implement the regularization as a Gromov-Wasserstein (GW) discrepancy term, which achieves a trade-off between regularity and computational efficiency.The TPPs learned through this method result in clustered sequence embeddings and demonstrate competitive predictive and clustering performance, significantly improving the model interpretability without compromising prediction accuracy.

Paper Structure

This paper contains 16 sections, 12 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The scheme of the proposed method.
  • Figure 2: An illustration of the improvement on clustering caused by our regularizer. The backbone model is THP zuo2020transformer and the event sequences are from the synthetic dataset ($K=2$). In (a, b), we sample 500 event sequences per cluster and visualize their embeddings by t-SNE. Furthermore, we visualize the kernel matrices obtained by (c) the nonparametric method in iwayama2017definition, (d) the embedding-based kernel obtained by the original THP model, and (e) the embedding-based kernel obtained by the THP learned with our regularizer.