The Role of Network and Identity in the Diffusion of Hashtags
Aparna Ananthasubramaniam, Yufei 'Louise' Zhu, David Jurgens, Daniel Romero
TL;DR
This study investigates how network exposure and demographic identity jointly drive the diffusion of hashtags on Twitter, using an agent-based diffusion model that extends a linear-threshold framework with a usage-based update rule. By compiling a dataset of 1,337 newly coined hashtags and evaluating cascades with a ten-metric Cascade Match Index, the authors show that a Network+Identity model generally outperforms network-only or identity-only baselines, with greatest gains for hashtags signaling regional or racial identity and sports or news topics. The work highlights context-dependent mechanisms, demonstrates potential for model selection to tailor diffusion predictions, and discusses limitations such as identity proxying and platform specificity, while releasing data and a framework to facilitate future research on the interplay of network, identity, and other social factors in online cultural diffusion. The findings have practical implications for understanding and predicting how culture propagates online, and for designing interventions that account for the coexistence of multiple social drivers in diffusion processes.
Abstract
The diffusion of culture online is theorized to be influenced by many interacting social factors (e.g., network and identity). However, most existing computational cascade models consider just a single factor (e.g., network or identity). This work offers a new framework for teasing apart the mechanisms underlying hashtag cascades. We curate a new dataset of 1,337 hashtags representing cultural innovation online, develop a 10-factor evaluation framework for comparing empirical and simulated cascades, and show that a combined network+identity model better simulates hashtag cascades than network- or identity-only counterfactuals. We also explore heterogeneity in performance: While a combined network+identity model best predicts the popularity of cascades, a network-only model best predicts cascade growth and an identity-only model best predicts adopter composition. The network+identity model has the highest comparative advantage among hashtags used for expressing racial or regional identity and talking about sports or news. In fact, we are able to predict what combination of network and/or identity best models each hashtag and use this to further improve performance. Our results show the utility of models incorporating the interactions of network, identity, and other social factors in the diffusion of hashtags in social media.
