Table of Contents
Fetching ...

SeqNAS: Neural Architecture Search for Event Sequence Classification

Igor Udovichenko, Egor Shvetsov, Denis Divitsky, Dmitry Osin, Ilya Trofimov, Anatoly Glushenko, Ivan Sukharev, Dmitry Berestenev, Evgeny Burnaev

TL;DR

SeqNAS addresses the challenge of neural architecture search for event sequence classification by designing a specialized search space that combines Stem, Encoder (MHA/GRU/Convolution), Temporal Encoding, Decoder, and Head blocks. It uses sequential Bayesian optimization with a CatBoost-based Predictor-model and an ensemble of teachers for knowledge distillation, achieving state-of-the-art results on six EvS datasets and releasing NAS-Bench Event Sequences to enable predictor-based NAS research. The work demonstrates that diverse, complementary operation types are collectively beneficial for EvS modeling, and it highlights the potential and trade-offs of NAS in industrial sequence tasks. This framework offers practical improvements for churn prediction, fraud detection, fault diagnosis, and related domains where irregularly timed, mixed-feature sequences are common.

Abstract

Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.

SeqNAS: Neural Architecture Search for Event Sequence Classification

TL;DR

SeqNAS addresses the challenge of neural architecture search for event sequence classification by designing a specialized search space that combines Stem, Encoder (MHA/GRU/Convolution), Temporal Encoding, Decoder, and Head blocks. It uses sequential Bayesian optimization with a CatBoost-based Predictor-model and an ensemble of teachers for knowledge distillation, achieving state-of-the-art results on six EvS datasets and releasing NAS-Bench Event Sequences to enable predictor-based NAS research. The work demonstrates that diverse, complementary operation types are collectively beneficial for EvS modeling, and it highlights the potential and trade-offs of NAS in industrial sequence tasks. This framework offers practical improvements for churn prediction, fraud detection, fault diagnosis, and related domains where irregularly timed, mixed-feature sequences are common.

Abstract

Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
Paper Structure (33 sections, 1 equation, 11 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 1 equation, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: An example of the marked temporal event process. Event $i$ occurs at time $t_i$ and is characterized (marked) by the feature vector $x_i$.
  • Figure 2: The general layout of our search space. Dotted borders indicate that blocks contain searchable operations. Dashed lines indicate that connections between nodes are searchable. The solid line is an example of selected architecture.
  • Figure 3: Searchable part of Stem block is depicted with dashed and dotted lines. Convolutional layers with different kernels and the presence of dropout are selected at each search step. A solid line is an example of a selected path.
  • Figure 4: There are two searchable pooling layers in Head Block: Max pooling and Average pooling. The type of a pooling layer and the presence or absence of spatial dropout are determined by the search procedure. A solid line is an example of a selected path.
  • Figure 5: Encoder Layer with searchable MHA, GRU and conv operations. A combination of one, two, or three operations can be selected during each search step. Different combinations are selected on different layers. Incoming features are divided into several selected operations. An example combination with MHA, GRU and conv operations is depicted with solid lines, and an example combination with MHA and conv operations is depicted with dashed lines. Dotted border around MHA indicates that it has a searchable number of heads.
  • ...and 6 more figures