MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
Viktor Moskvoretskii, Dmitry Osin, Egor Shvetsov, Igor Udovichenko, Maxim Zhelnin, Andrey Dukhovny, Anna Zhimerikina, Evgeny Burnaev
TL;DR
Event Sequences (EvS) pose unique SSL challenges due to irregular timing and mixed feature types. The authors compare generative, contrastive, and naive hybrid SSL methods and introduce MLEM, a multimodal approach that aligns generative and contrastive embeddings via a SigLIP-inspired alignment loss, while optimizing a generative reconstruction objective. Results show that neither pure generative nor pure contrastive pre-training consistently dominates; MLEM generally delivers superior performance across downstream tasks and embedding quality, albeit with sensitivity to data sparsity and higher computational costs. This work demonstrates the value of treating diverse SSL signals as complementary modalities for EvS, offering a practical route to more robust self-supervised representations in irregular time-series domains.
Abstract
This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.
