Table of Contents
Fetching ...

Behavioral Sequence Modeling with Ensemble Learning

Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso

TL;DR

This work presents a framework for sequence modeling using Ensembles of Hidden Markov Models, which are lightweight, interpretable, and efficient, and applicable in both supervised and unsupervised learning settings.

Abstract

We investigate the use of sequence analysis for behavior modeling, emphasizing that sequential context often outweighs the value of aggregate features in understanding human behavior. We discuss framing common problems in fields like healthcare, finance, and e-commerce as sequence modeling tasks, and address challenges related to constructing coherent sequences from fragmented data and disentangling complex behavior patterns. We present a framework for sequence modeling using Ensembles of Hidden Markov Models, which are lightweight, interpretable, and efficient. Our ensemble-based scoring method enables robust comparison across sequences of different lengths and enhances performance in scenarios with imbalanced or scarce data. The framework scales in real-world scenarios, is compatible with downstream feature-based modeling, and is applicable in both supervised and unsupervised learning settings. We demonstrate the effectiveness of our method with results on a longitudinal human behavior dataset.

Behavioral Sequence Modeling with Ensemble Learning

TL;DR

This work presents a framework for sequence modeling using Ensembles of Hidden Markov Models, which are lightweight, interpretable, and efficient, and applicable in both supervised and unsupervised learning settings.

Abstract

We investigate the use of sequence analysis for behavior modeling, emphasizing that sequential context often outweighs the value of aggregate features in understanding human behavior. We discuss framing common problems in fields like healthcare, finance, and e-commerce as sequence modeling tasks, and address challenges related to constructing coherent sequences from fragmented data and disentangling complex behavior patterns. We present a framework for sequence modeling using Ensembles of Hidden Markov Models, which are lightweight, interpretable, and efficient. Our ensemble-based scoring method enables robust comparison across sequences of different lengths and enhances performance in scenarios with imbalanced or scarce data. The framework scales in real-world scenarios, is compatible with downstream feature-based modeling, and is applicable in both supervised and unsupervised learning settings. We demonstrate the effectiveness of our method with results on a longitudinal human behavior dataset.

Paper Structure

This paper contains 14 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Flow diagram of our sequence construction approach, as detailed in Section \ref{['sec:sequence_construction']}. We disentangle the monolithic dataset $\mathcal{D}$ into data streams, then process these further into sets of observation sequences. Our subsequent HMM-e ensemble training approach is detailed in Section \ref{['sec:hmm_ensembles']}. While we adopt this approach using HMMs, the framework itself is model agnostic. The training data is broken into random subsets, and a diverse ensemble of learners is trained on these subsets.
  • Figure 2: UMAP embeddings of features $f_i$, as discussed in Section \ref{['sec:clustering_unsupervised']}, from a 500-model ensemble. Colors correspond to clusters discovered via K-Means.
  • Figure 3: Flow diagram of our HMM-e ensemble training and inference approach, as detailed in Section \ref{['sec:hmm_ensembles']}. While we adopt this approach using HMMs, the framework itself is model agnostic. The training data is broken into random subsets, and a diverse ensemble of learners is trained on these subsets. At inference time, pairwise matchups of likelihoods given by the models are compared, giving the composite score $s$.
  • Figure 4: Samples of the daily features we model, across 3 anonymized 2018 participants. For visualization purposes, features are lightly smoothed using an exponential weighted moving average with a half-life of 4 days. For modeling, features are normalized.