Table of Contents
Fetching ...

Exploring temporal dynamics in digital trace data: mining user-sequences for communication research

Yangliu Fan, Jakob Ohme, Lion Wedel

TL;DR

This paper addresses the challenge of leveraging hyper longitudinal digital trace data to study temporal dynamics in communication. It proposes a user sequences framework that preserves fine grained timestamps and applies six analytical approaches to a large data donation dataset spanning four platforms. The case study demonstrates how sequence analysis, event history analysis, hidden Markov models, network analysis, process mining, and language based models reveal patterns such as platform switching, activity motifs, and latent states, while also highlighting methodological challenges like data volume, alignment, and generalizability. The findings suggest that digital trace data offer unprecedented granularity for theory building and cross platform research, with language based embeddings showing particular promise given sufficient high quality data.

Abstract

Communication is commonly considered a process that is dynamically situated in a temporal context. However, there remains a disconnection between such theoretical dynamicality and the non-dynamical character of communication scholars' preferred methodologies. In this paper, we argue for a new research framework that uses computational approaches to leverage the fine-grained timestamps recorded in digital trace data. In particular, we propose to maintain the hyper-longitudinal information in the trace data and analyze time-evolving 'user-sequences,' which provide rich information about user activity with high temporal resolution. To illustrate our proposed framework, we present a case study that applied six approaches (e.g., sequence analysis, process mining, and language-based models) to real-world user-sequences containing 1,262,775 timestamped traces from 309 unique users, gathered via data donations. Overall, our study suggests a conceptual reorientation towards a better understanding of the temporal dimension in communication processes, resting on the exploding supply of digital trace data and the technical advances in analytical approaches.

Exploring temporal dynamics in digital trace data: mining user-sequences for communication research

TL;DR

This paper addresses the challenge of leveraging hyper longitudinal digital trace data to study temporal dynamics in communication. It proposes a user sequences framework that preserves fine grained timestamps and applies six analytical approaches to a large data donation dataset spanning four platforms. The case study demonstrates how sequence analysis, event history analysis, hidden Markov models, network analysis, process mining, and language based models reveal patterns such as platform switching, activity motifs, and latent states, while also highlighting methodological challenges like data volume, alignment, and generalizability. The findings suggest that digital trace data offer unprecedented granularity for theory building and cross platform research, with language based embeddings showing particular promise given sufficient high quality data.

Abstract

Communication is commonly considered a process that is dynamically situated in a temporal context. However, there remains a disconnection between such theoretical dynamicality and the non-dynamical character of communication scholars' preferred methodologies. In this paper, we argue for a new research framework that uses computational approaches to leverage the fine-grained timestamps recorded in digital trace data. In particular, we propose to maintain the hyper-longitudinal information in the trace data and analyze time-evolving 'user-sequences,' which provide rich information about user activity with high temporal resolution. To illustrate our proposed framework, we present a case study that applied six approaches (e.g., sequence analysis, process mining, and language-based models) to real-world user-sequences containing 1,262,775 timestamped traces from 309 unique users, gathered via data donations. Overall, our study suggests a conceptual reorientation towards a better understanding of the temporal dimension in communication processes, resting on the exploding supply of digital trace data and the technical advances in analytical approaches.

Paper Structure

This paper contains 25 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An example of individual-level data representation. Here, we organize the multi-platform digital trace data into a chronologically ordered user-sequence. Each block represents an activity, and the color of the block indicates the platform. Each activity is associated with a timestamp and other information.
  • Figure 2: Sequence index plot of platform and activity types. The plots use colored stacked bars to show how users move between different activities over time. Each horizontal bar represents a user, and the segments' colors indicate the platform and activity type.
  • Figure 3: Survival curves for users remain on the same platform. For each subplot, the x-axis represents the duration time t, and the y-axis represents the probability that a user is still on this platform after t minutes.
  • Figure 4: Transition/emission probabilities among the four learned hidden states. (a) the heatmap shows the probability of transitioning from one hidden state to another, with the darker colors indicating a higher probability. (b) the bar plots show each hidden state with emission probabilities for each activity type. Here, each bar represents the probability of observing a specific activity given the hidden state.
  • Figure 5: Networks of platforms/activities. Here, each node represents a platform or an activity. The links connect two nodes when they appear sequentially in user-sequences. The nodes are colored by platforms and sized by degree centralities.
  • ...and 3 more figures