Table of Contents
Fetching ...

EBES: Easy Benchmarking for Event Sequences

Dmitry Osin, Igor Udovichenko, Viktor Moskvoretskii, Egor Shvetsov, Evgeny Burnaev

TL;DR

EBES establishes a standardized, open benchmark for Event Sequence (EvS) classification, addressing the lack of cross-study comparability in EvS research. It provides a rigorous evaluation protocol, a plug-in PyTorch library with 9 models, and a curated set of 10 open-access datasets spanning discrete EvS, continuous EvS, and time-series domains, including a novel synthetic Pendulum and a large banking dataset (MBD). The empirical study reveals that GRU-based models dominate EvS classification and that EvS properties differ from traditional time-series tasks, with dataset-specific dynamics governing the usefulness of time information and event order. The benchmark highlights the importance of principled hyperparameter optimization, robust evaluation, and dataset diversity to advance reproducible EvS research and real-world impact.

Abstract

Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features. Accurate classification of these sequences is crucial for various real-life applications, including healthcare, finance, and user interaction. Despite the popularity of the EvS classification task, there is currently no standardized benchmark or rigorous evaluation protocol. This lack of standardization makes it difficult to compare results across studies, which can result in unreliable conclusions and hinder progress in the field. To address this gap, we present EBES, a comprehensive benchmark for EvS classification with sequence-level targets. EBES features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models. Additionally, it includes the largest collection of EvS datasets, featuring 10 curated datasets, including a novel synthetic dataset and real-world data with the largest publicly available banking dataset. The library offers user-friendly interfaces for integrating new methods and datasets. Our benchmarking results highlight the unique properties of EvS compared to other sequential data types, provide a performance ranking of modern models with GRU-based models achieving the best results and reveal the challenges associated with robust EvS learning. The goal of EBES is to facilitate reproducible research, expedite progress in the field, and increase the real-world impact of EvS classification techniques.

EBES: Easy Benchmarking for Event Sequences

TL;DR

EBES establishes a standardized, open benchmark for Event Sequence (EvS) classification, addressing the lack of cross-study comparability in EvS research. It provides a rigorous evaluation protocol, a plug-in PyTorch library with 9 models, and a curated set of 10 open-access datasets spanning discrete EvS, continuous EvS, and time-series domains, including a novel synthetic Pendulum and a large banking dataset (MBD). The empirical study reveals that GRU-based models dominate EvS classification and that EvS properties differ from traditional time-series tasks, with dataset-specific dynamics governing the usefulness of time information and event order. The benchmark highlights the importance of principled hyperparameter optimization, robust evaluation, and dataset diversity to advance reproducible EvS research and real-world impact.

Abstract

Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features. Accurate classification of these sequences is crucial for various real-life applications, including healthcare, finance, and user interaction. Despite the popularity of the EvS classification task, there is currently no standardized benchmark or rigorous evaluation protocol. This lack of standardization makes it difficult to compare results across studies, which can result in unreliable conclusions and hinder progress in the field. To address this gap, we present EBES, a comprehensive benchmark for EvS classification with sequence-level targets. EBES features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models. Additionally, it includes the largest collection of EvS datasets, featuring 10 curated datasets, including a novel synthetic dataset and real-world data with the largest publicly available banking dataset. The library offers user-friendly interfaces for integrating new methods and datasets. Our benchmarking results highlight the unique properties of EvS compared to other sequential data types, provide a performance ranking of modern models with GRU-based models achieving the best results and reveal the challenges associated with robust EvS learning. The goal of EBES is to facilitate reproducible research, expedite progress in the field, and increase the real-world impact of EvS classification techniques.
Paper Structure (49 sections, 6 equations, 14 figures, 10 tables)

This paper contains 49 sections, 6 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Categorization of sequential data. Green and blue dots indicate numerical features, while different shapes denote categorical features. For TS and Continuous EvS, an underlying process is present. For Discrete EvS, no underlying process exists, making interpolation between neighboring points meaningless.
  • Figure 2: Data splits and their usage in our evaluation procedure. For each seed, the training sample is randomly divided into train, train-val, and hpo-val sets, while the test set is separated only once during data preprocessing.
  • Figure 3: Data Scaling Results. We take models with the best hyperparameters and retrain them on subsets of varying sizes. The number of sequences is presented on a log scale. Standard deviation across 3 runs is indicated by vertical lines.
  • Figure 4: Pendulum motion at various instances, with time steps determined by a Hawkes process.
  • Figure 5: Performance metric relationships and correlations of different subsets among all methods on Age dataset
  • ...and 9 more figures