EBES: Easy Benchmarking for Event Sequences
Dmitry Osin, Igor Udovichenko, Viktor Moskvoretskii, Egor Shvetsov, Evgeny Burnaev
TL;DR
EBES establishes a standardized, open benchmark for Event Sequence (EvS) classification, addressing the lack of cross-study comparability in EvS research. It provides a rigorous evaluation protocol, a plug-in PyTorch library with 9 models, and a curated set of 10 open-access datasets spanning discrete EvS, continuous EvS, and time-series domains, including a novel synthetic Pendulum and a large banking dataset (MBD). The empirical study reveals that GRU-based models dominate EvS classification and that EvS properties differ from traditional time-series tasks, with dataset-specific dynamics governing the usefulness of time information and event order. The benchmark highlights the importance of principled hyperparameter optimization, robust evaluation, and dataset diversity to advance reproducible EvS research and real-world impact.
Abstract
Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features. Accurate classification of these sequences is crucial for various real-life applications, including healthcare, finance, and user interaction. Despite the popularity of the EvS classification task, there is currently no standardized benchmark or rigorous evaluation protocol. This lack of standardization makes it difficult to compare results across studies, which can result in unreliable conclusions and hinder progress in the field. To address this gap, we present EBES, a comprehensive benchmark for EvS classification with sequence-level targets. EBES features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models. Additionally, it includes the largest collection of EvS datasets, featuring 10 curated datasets, including a novel synthetic dataset and real-world data with the largest publicly available banking dataset. The library offers user-friendly interfaces for integrating new methods and datasets. Our benchmarking results highlight the unique properties of EvS compared to other sequential data types, provide a performance ranking of modern models with GRU-based models achieving the best results and reveal the challenges associated with robust EvS learning. The goal of EBES is to facilitate reproducible research, expedite progress in the field, and increase the real-world impact of EvS classification techniques.
