SeqNAS: Neural Architecture Search for Event Sequence Classification
Igor Udovichenko, Egor Shvetsov, Denis Divitsky, Dmitry Osin, Ilya Trofimov, Anatoly Glushenko, Ivan Sukharev, Dmitry Berestenev, Evgeny Burnaev
TL;DR
SeqNAS addresses the challenge of neural architecture search for event sequence classification by designing a specialized search space that combines Stem, Encoder (MHA/GRU/Convolution), Temporal Encoding, Decoder, and Head blocks. It uses sequential Bayesian optimization with a CatBoost-based Predictor-model and an ensemble of teachers for knowledge distillation, achieving state-of-the-art results on six EvS datasets and releasing NAS-Bench Event Sequences to enable predictor-based NAS research. The work demonstrates that diverse, complementary operation types are collectively beneficial for EvS modeling, and it highlights the potential and trade-offs of NAS in industrial sequence tasks. This framework offers practical improvements for churn prediction, fraud detection, fault diagnosis, and related domains where irregularly timed, mixed-feature sequences are common.
Abstract
Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
