ESQA: Event Sequences Question Answering
Irina Abdullaeva, Andrei Filatov, Mikhail Orlov, Ivan Karpukhin, Viacheslav Vasilev, Denis Dimitrov, Andrey Kuznetsov, Ivan Kireev, Andrey Savchenko
TL;DR
ESQA introduces Event Sequences Question Answering, a multimodal architecture that leverages a frozen FLAN-T5 LLM with parameter-efficient fine-tuning to model irregular time-stamped event sequences. It frames downstream tasks as natural language questions, uses a trainable event-embedding encoder, and connects an event sequence representation to the LLM through a Q-Former–based connector, enabling accurate extractive and predictive reasoning over long sequences without extensive fine-tuning. Empirical results across five public datasets show ESQA is competitive with or superior to strong baselines, particularly on categorical and temporal next-event predictions, and demonstrates notable zero-shot generalization to unseen tasks. Limitations include discretization-induced errors for numerical features and challenges with highly unbalanced or regression-heavy zero-shot settings, with future work targeting improved temporal processing and unbalanced-class handling.
Abstract
Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain.
