Table of Contents
Fetching ...

The Role of Network and Identity in the Diffusion of Hashtags

Aparna Ananthasubramaniam, Yufei 'Louise' Zhu, David Jurgens, Daniel Romero

TL;DR

This study investigates how network exposure and demographic identity jointly drive the diffusion of hashtags on Twitter, using an agent-based diffusion model that extends a linear-threshold framework with a usage-based update rule. By compiling a dataset of 1,337 newly coined hashtags and evaluating cascades with a ten-metric Cascade Match Index, the authors show that a Network+Identity model generally outperforms network-only or identity-only baselines, with greatest gains for hashtags signaling regional or racial identity and sports or news topics. The work highlights context-dependent mechanisms, demonstrates potential for model selection to tailor diffusion predictions, and discusses limitations such as identity proxying and platform specificity, while releasing data and a framework to facilitate future research on the interplay of network, identity, and other social factors in online cultural diffusion. The findings have practical implications for understanding and predicting how culture propagates online, and for designing interventions that account for the coexistence of multiple social drivers in diffusion processes.

Abstract

The diffusion of culture online is theorized to be influenced by many interacting social factors (e.g., network and identity). However, most existing computational cascade models consider just a single factor (e.g., network or identity). This work offers a new framework for teasing apart the mechanisms underlying hashtag cascades. We curate a new dataset of 1,337 hashtags representing cultural innovation online, develop a 10-factor evaluation framework for comparing empirical and simulated cascades, and show that a combined network+identity model better simulates hashtag cascades than network- or identity-only counterfactuals. We also explore heterogeneity in performance: While a combined network+identity model best predicts the popularity of cascades, a network-only model best predicts cascade growth and an identity-only model best predicts adopter composition. The network+identity model has the highest comparative advantage among hashtags used for expressing racial or regional identity and talking about sports or news. In fact, we are able to predict what combination of network and/or identity best models each hashtag and use this to further improve performance. Our results show the utility of models incorporating the interactions of network, identity, and other social factors in the diffusion of hashtags in social media.

The Role of Network and Identity in the Diffusion of Hashtags

TL;DR

This study investigates how network exposure and demographic identity jointly drive the diffusion of hashtags on Twitter, using an agent-based diffusion model that extends a linear-threshold framework with a usage-based update rule. By compiling a dataset of 1,337 newly coined hashtags and evaluating cascades with a ten-metric Cascade Match Index, the authors show that a Network+Identity model generally outperforms network-only or identity-only baselines, with greatest gains for hashtags signaling regional or racial identity and sports or news topics. The work highlights context-dependent mechanisms, demonstrates potential for model selection to tailor diffusion predictions, and discusses limitations such as identity proxying and platform specificity, while releasing data and a framework to facilitate future research on the interplay of network, identity, and other social factors in online cultural diffusion. The findings have practical implications for understanding and predicting how culture propagates online, and for designing interventions that account for the coexistence of multiple social drivers in diffusion processes.

Abstract

The diffusion of culture online is theorized to be influenced by many interacting social factors (e.g., network and identity). However, most existing computational cascade models consider just a single factor (e.g., network or identity). This work offers a new framework for teasing apart the mechanisms underlying hashtag cascades. We curate a new dataset of 1,337 hashtags representing cultural innovation online, develop a 10-factor evaluation framework for comparing empirical and simulated cascades, and show that a combined network+identity model better simulates hashtag cascades than network- or identity-only counterfactuals. We also explore heterogeneity in performance: While a combined network+identity model best predicts the popularity of cascades, a network-only model best predicts cascade growth and an identity-only model best predicts adopter composition. The network+identity model has the highest comparative advantage among hashtags used for expressing racial or regional identity and talking about sports or news. In fact, we are able to predict what combination of network and/or identity best models each hashtag and use this to further improve performance. Our results show the utility of models incorporating the interactions of network, identity, and other social factors in the diffusion of hashtags in social media.
Paper Structure (54 sections, 7 equations, 7 figures, 1 table)

This paper contains 54 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The Network+Identity model outperforms the Network-only and Identity-only baselines. Models evaluated on the full cmi and just the subset of indices corresponding to popularity, growth, and adopter characteristics. Higher cmi scores corresponds to better performance.
  • Figure 2: The comparative advantage of modeling cascades using both network and identity is highest when a) initial adopters are located very close to each other; b) have a high degree of racial similarity; c-f) have a moderate degree of linguistic, socioeconomic, and political similarity and eigencentrality; g) hashtags convey a similar meaning as a moderate number of other hashtags, and h) their meaning is not becoming increasingly popular over time. Effects are estimated by running a regression, controlling for other variables related to the hashtag's context.
  • Figure 3: Although the Network+Identity model never underperforms the others, the relative advantage of the Network+Identity model varies by the topic of the hashtag. Effects are estimated by running a regression, controlling for other variables related to the hashtag's context.
  • Figure 4: A combined model that selects among the three models does better than the Network+Identity model alone.
  • Figure S1: Performance of each model on the cmi, based on the size of the empirical hashtag cascade (i.e., how many times it was used in the Twitter Decahose sample). The cascade sizes are binned by quintile.
  • ...and 2 more figures