Higher-Order DeepTrails: Unified Approach to *Trails
Tobias Koopmann, Jan Pfister, André Markus, Astrid Carolus, Carolin Wienrich, Andreas Hotho
TL;DR
The paper tackles the limitation of first-order Markov models in capturing higher-order dependencies in sequential user behavior. It introduces a unified framework that trains a small autoregressive language model on observed sequences and uses the frozen model's position-wise cross-entropy loss $L_{\mathrm{CE}}$ to evaluate hypotheses and feature-driven subgroups across three settings: DeepHypTrails, DeepMixedTrails, and DeepSubTrails. Through synthetic datasets and a real-world voice-assistant case study, the approach demonstrates its ability to model higher-order dependencies, perform per-position diagnostics, and handle homogeneous, heterogeneous, and subgroup analyses within a single methodology. The results indicate improved insight into user behavior, enabling better UX design, personalized recommendations, and adaptive interfaces for complex sequential interactions. Overall, the work provides a principled, unified methodology for higher-order sequential analysis with practical impact on real-world interaction data.
Abstract
Analyzing, understanding, and describing human behavior is advantageous in different settings, such as web browsing or traffic navigation. Understanding human behavior naturally helps to improve and optimize the underlying infrastructure or user interfaces. Typically, human navigation is represented by sequences of transitions between states. Previous work suggests to use hypotheses, representing different intuitions about the navigation to analyze these transitions. To mathematically grasp this setting, first-order Markov chains are used to capture the behavior, consequently allowing to apply different kinds of graph comparisons, but comes with the inherent drawback of losing information about higher-order dependencies within the sequences. To this end, we propose to analyze entire sequences using autoregressive language models, as they are traditionally used to model higher-order dependencies in sequences. We show that our approach can be easily adapted to model different settings introduced in previous work, namely HypTrails, MixedTrails and even SubTrails, while at the same time bringing unique advantages: 1. Modeling higher-order dependencies between state transitions, while 2. being able to identify short comings in proposed hypotheses, and 3. naturally introducing a unified approach to model all settings. To show the expressiveness of our approach, we evaluate our approach on different synthetic datasets and conclude with an exemplary analysis of a real-world dataset, examining the behavior of users who interact with voice assistants.
