Table of Contents
Fetching ...

ESQA: Event Sequences Question Answering

Irina Abdullaeva, Andrei Filatov, Mikhail Orlov, Ivan Karpukhin, Viacheslav Vasilev, Denis Dimitrov, Andrey Kuznetsov, Ivan Kireev, Andrey Savchenko

TL;DR

ESQA introduces Event Sequences Question Answering, a multimodal architecture that leverages a frozen FLAN-T5 LLM with parameter-efficient fine-tuning to model irregular time-stamped event sequences. It frames downstream tasks as natural language questions, uses a trainable event-embedding encoder, and connects an event sequence representation to the LLM through a Q-Former–based connector, enabling accurate extractive and predictive reasoning over long sequences without extensive fine-tuning. Empirical results across five public datasets show ESQA is competitive with or superior to strong baselines, particularly on categorical and temporal next-event predictions, and demonstrates notable zero-shot generalization to unseen tasks. Limitations include discretization-induced errors for numerical features and challenges with highly unbalanced or regression-heavy zero-shot settings, with future work targeting improved temporal processing and unbalanced-class handling.

Abstract

Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain.

ESQA: Event Sequences Question Answering

TL;DR

ESQA introduces Event Sequences Question Answering, a multimodal architecture that leverages a frozen FLAN-T5 LLM with parameter-efficient fine-tuning to model irregular time-stamped event sequences. It frames downstream tasks as natural language questions, uses a trainable event-embedding encoder, and connects an event sequence representation to the LLM through a Q-Former–based connector, enabling accurate extractive and predictive reasoning over long sequences without extensive fine-tuning. Empirical results across five public datasets show ESQA is competitive with or superior to strong baselines, particularly on categorical and temporal next-event predictions, and demonstrates notable zero-shot generalization to unseen tasks. Limitations include discretization-induced errors for numerical features and challenges with highly unbalanced or regression-heavy zero-shot settings, with future work targeting improved temporal processing and unbalanced-class handling.

Abstract

Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain.
Paper Structure (24 sections, 5 equations, 3 figures, 9 tables)

This paper contains 24 sections, 5 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Model architecture. The components of the approach that do not require training are colored in blue. Components whose weights are optimised during training are colored in orange. The trainable embeddings and associated tokens are colored in red.
  • Figure 2: a) Event sequences features encoding; in the example, there are $N$ numerical and $C$ categorical features, which are concatenated into a tensor $e_i^{emb}$ of dimension $dim(e_i^{emb})$. b) The event sequence encoder model processes the concatenated feature embedding vectors $S_n^{emb}$ for all events within a sequence, ultimately producing a comprehensive embedding $\tilde{S_n}^{emb}$ for the entire event sequence.
  • Figure 3: The Q-Former model's architecture is designed to extract the most relevant event sequence representations. It produces $q$ query embeddings for each event sequence, which are then linearly projected to the size of the language model embedding and appended to the embedded question tokens. Subsequently, the joint sequence is transmitted to the LLM.