Table of Contents
Fetching ...

Pretext Training Algorithms for Event Sequence Data

Yimu Wang, He Zhao, Ruizhi Deng, Frederick Tung, Greg Mori

TL;DR

This work addresses learning from unlabeled event sequence data by introducing a self-supervised framework with three complementary pretext tasks: masked reconstruction with density-preserving masking, contrastive learning using multiple views of event sequences, and alignment verification to enforce correct time–type coupling. The methods are architecture-agnostic and applicable to downstream tasks such as next-event prediction in temporal point processes, sequence-level classification, and missing-event interpolation. Empirical results on StackOverflow, MIMIC-II, Mooc, and Reddit demonstrate that pretext training improves NLL, RMSE, accuracy, and AUC across tasks, with ablations confirming the three tasks' complementary benefits. The study also compares zero-shot LLM predictions for time and type, finding LLMs competitive for timing yet weaker for event-type assignment, underscoring the value of specialized pretraining for event sequences and suggesting avenues for few-shot and synthetic-data scaling.

Abstract

Pretext training followed by task-specific fine-tuning has been a successful approach in vision and language domains. This paper proposes a self-supervised pretext training framework tailored to event sequence data. We introduce a novel alignment verification task that is specialized to event sequences, building on good practices in masked reconstruction and contrastive learning. Our pretext tasks unlock foundational representations that are generalizable across different down-stream tasks, including next-event prediction for temporal point process models, event sequence classification, and missing event interpolation. Experiments on popular public benchmarks demonstrate the potential of the proposed method across different tasks and data domains.

Pretext Training Algorithms for Event Sequence Data

TL;DR

This work addresses learning from unlabeled event sequence data by introducing a self-supervised framework with three complementary pretext tasks: masked reconstruction with density-preserving masking, contrastive learning using multiple views of event sequences, and alignment verification to enforce correct time–type coupling. The methods are architecture-agnostic and applicable to downstream tasks such as next-event prediction in temporal point processes, sequence-level classification, and missing-event interpolation. Empirical results on StackOverflow, MIMIC-II, Mooc, and Reddit demonstrate that pretext training improves NLL, RMSE, accuracy, and AUC across tasks, with ablations confirming the three tasks' complementary benefits. The study also compares zero-shot LLM predictions for time and type, finding LLMs competitive for timing yet weaker for event-type assignment, underscoring the value of specialized pretraining for event sequences and suggesting avenues for few-shot and synthetic-data scaling.

Abstract

Pretext training followed by task-specific fine-tuning has been a successful approach in vision and language domains. This paper proposes a self-supervised pretext training framework tailored to event sequence data. We introduce a novel alignment verification task that is specialized to event sequences, building on good practices in masked reconstruction and contrastive learning. Our pretext tasks unlock foundational representations that are generalizable across different down-stream tasks, including next-event prediction for temporal point process models, event sequence classification, and missing event interpolation. Experiments on popular public benchmarks demonstrate the potential of the proposed method across different tasks and data domains.
Paper Structure (54 sections, 4 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 54 sections, 4 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the pretext training and fine-tuning framework for event sequences. In the first stage, event sequence embeddings are input to self-supervised pretext tasks based on masked reconstruction, contrastive learning, and alignment verification, to learn general representations. The second stage performs task-specific fine-tuning to support downstream tasks such as next-event prediction, sequence classification, and missing event interpolation.
  • Figure 2: Visualization of the three approaches we use to create misaligned event sequences: shuffle, swap and crossover. For illustrative purposes, we show the approaches with event sequences of length 4. The red dashed lines in Figure \ref{['align:crossover']} indicate the cut-off point of crossover.
  • Figure 3: Accuracy (higher is better) and RMSE (lower is better) of missing events imputation on the Mooc dataset. Baseline refers to Ours without pretext training.
  • Figure 4: NLL of baseline (Ours w/o pretext training) and Ours on Stack Overflow versus the number of layers.
  • Figure 5: NLL of baseline (Ours w/o pretext training) and Ours on Stack Overflow versus the number of feature dimensions.