Table of Contents
Fetching ...

How to model Human Actions distribution with Event Sequence Data

Egor Surkov, Dmitry Osin, Evgeny Burnaev, Egor Shvetsov

TL;DR

Forecasting human action distributions over a fixed horizon in event sequences reveals that preserving exact temporal order is often unnecessary. The paper introduces a $KL$-based staticity index and four distribution-focused objectives, showing that explicit distribution forecasting generally outperforms autoregressive and multi-token methods, with mode collapse linked to distributional imbalance. Across diverse datasets, order-invariant approaches like GRU-Dist excel when event presence matters more than order, while local sequential structure remains critical in a few domains (e.g., Shakespeare, Zvuk). The work provides a practical framework, including dataset diagnostics and decoding strategies, to guide practitioners in building robust EvS forecasting systems.

Abstract

This paper studies forecasting of the future distribution of events in human action sequences, a task essential in domains like retail, finance, healthcare, and recommendation systems where the precise temporal order is often less critical than the set of outcomes. We challenge the dominant autoregressive paradigm and investigate whether explicitly modeling the future distribution or order-invariant multi-token approaches outperform order-preserving methods. We analyze local order invariance and introduce a KL-based metric to quantify temporal drift. We find that a simple explicit distribution forecasting objective consistently surpasses complex implicit baselines. We further demonstrate that mode collapse of predicted categories is primarily driven by distributional imbalance. This work provides a principled framework for selecting modeling strategies and offers practical guidance for building more accurate and robust forecasting systems.

How to model Human Actions distribution with Event Sequence Data

TL;DR

Forecasting human action distributions over a fixed horizon in event sequences reveals that preserving exact temporal order is often unnecessary. The paper introduces a -based staticity index and four distribution-focused objectives, showing that explicit distribution forecasting generally outperforms autoregressive and multi-token methods, with mode collapse linked to distributional imbalance. Across diverse datasets, order-invariant approaches like GRU-Dist excel when event presence matters more than order, while local sequential structure remains critical in a few domains (e.g., Shakespeare, Zvuk). The work provides a practical framework, including dataset diagnostics and decoding strategies, to guide practitioners in building robust EvS forecasting systems.

Abstract

This paper studies forecasting of the future distribution of events in human action sequences, a task essential in domains like retail, finance, healthcare, and recommendation systems where the precise temporal order is often less critical than the set of outcomes. We challenge the dominant autoregressive paradigm and investigate whether explicitly modeling the future distribution or order-invariant multi-token approaches outperform order-preserving methods. We analyze local order invariance and introduce a KL-based metric to quantify temporal drift. We find that a simple explicit distribution forecasting objective consistently surpasses complex implicit baselines. We further demonstrate that mode collapse of predicted categories is primarily driven by distributional imbalance. This work provides a principled framework for selecting modeling strategies and offers practical guidance for building more accurate and robust forecasting systems.

Paper Structure

This paper contains 47 sections, 10 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Distribution of categories in datasets. We present normalized number of categories.
  • Figure 2: Next $N$ tokens forecasting. Perplexity results.
  • Figure 3: Effect of Local Event Shuffling on Model Performance. We report Matched-F1 score and Carnality for four datasets. Results for other datasets and metrics can be found in Appendix \ref{['appendix:all_results']}
  • Figure 4: Example how order importance differs in different types of data. Even though in both cases horizon distribution doesnt change, event sequence still make sence after permut inside intervals.
  • Figure 5: Shape score drift for MBD dataset
  • ...and 5 more figures