Table of Contents
Fetching ...

Measuring Dimensions of Self-Presentation in Twitter Bios and their Links to Misinformation Sharing

Navid Madani, Rabiraj Bandyopadhyay, Briony Swire-Thompson, Michael Miller Yoder, Kenneth Joseph

TL;DR

This paper develops and validates three embedding-based methods to project Twitter bios onto meaningful social dimensions (e.g., age, partisanship, religiosity) without retraining per dimension, using a large in-domain bios dataset. The authors compare a Bio-only Word2Vec model, a fine-tuned BERT model, and a fine-tuned SBERT model, finding that the SBERT variant offers the best generalization to unseen identities and strongest alignment with human judgments and external ideology measures. They apply the embeddings in a misinformation-sharing context, uncovering an age-by-partisanship interaction and a strong religiosity effect that increases low-quality news sharing, with replication across a second dataset. Overall, the work provides a practical, publicly available toolkit for computational social scientists to study bios and misinformation, and demonstrates robust links between self-presentation and online behavior with direct implications for misinformation mitigation.

Abstract

Social media platforms provide users with a profile description field, commonly known as a ``bio," where they can present themselves to the world. A growing literature shows that text in these bios can improve our understanding of online self-presentation and behavior, but existing work relies exclusively on keyword-based approaches to do so. We here propose and evaluate a suite of \hl{simple, effective, and theoretically motivated} approaches to embed bios in spaces that capture salient dimensions of social meaning, such as age and partisanship. We \hl{evaluate our methods on four tasks, showing that the strongest one out-performs several practical baselines.} We then show the utility of our method in helping understand associations between self-presentation and the sharing of URLs from low-quality news sites on Twitter\hl{, with a particular focus on explore the interactions between age and partisanship, and exploring the effects of self-presentations of religiosity}. Our work provides new tools to help computational social scientists make use of information in bios, and provides new insights into how misinformation sharing may be perceived on Twitter.

Measuring Dimensions of Self-Presentation in Twitter Bios and their Links to Misinformation Sharing

TL;DR

This paper develops and validates three embedding-based methods to project Twitter bios onto meaningful social dimensions (e.g., age, partisanship, religiosity) without retraining per dimension, using a large in-domain bios dataset. The authors compare a Bio-only Word2Vec model, a fine-tuned BERT model, and a fine-tuned SBERT model, finding that the SBERT variant offers the best generalization to unseen identities and strongest alignment with human judgments and external ideology measures. They apply the embeddings in a misinformation-sharing context, uncovering an age-by-partisanship interaction and a strong religiosity effect that increases low-quality news sharing, with replication across a second dataset. Overall, the work provides a practical, publicly available toolkit for computational social scientists to study bios and misinformation, and demonstrates robust links between self-presentation and online behavior with direct implications for misinformation mitigation.

Abstract

Social media platforms provide users with a profile description field, commonly known as a ``bio," where they can present themselves to the world. A growing literature shows that text in these bios can improve our understanding of online self-presentation and behavior, but existing work relies exclusively on keyword-based approaches to do so. We here propose and evaluate a suite of \hl{simple, effective, and theoretically motivated} approaches to embed bios in spaces that capture salient dimensions of social meaning, such as age and partisanship. We \hl{evaluate our methods on four tasks, showing that the strongest one out-performs several practical baselines.} We then show the utility of our method in helping understand associations between self-presentation and the sharing of URLs from low-quality news sites on Twitter\hl{, with a particular focus on explore the interactions between age and partisanship, and exploring the effects of self-presentations of religiosity}. Our work provides new tools to help computational social scientists make use of information in bios, and provides new insights into how misinformation sharing may be perceived on Twitter.
Paper Structure (44 sections, 3 equations, 11 figures, 4 tables)

This paper contains 44 sections, 3 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Performance of each model (x-axis) on each of our two outcome metrics (separate plot rows) for the Main Test set and the generalizability test set (separate plot columns). Note that for rankings, lower is better.
  • Figure 2: An example of a survey question asked on the single identity projection evaluation for the identity "father of three" on the partisanship dimension
  • Figure 3: The Spearman correlation (y-axis) between projections and human judgements on 250 social identities for each model (x-axis) and dimension (shape/color). Error bars are 95% bootstrapped confidence intervals.
  • Figure 4: Spearman correlations (x-axis) between projections onto four different social dimensions (separate subplots) for SBERT and Fine-tuned SBERT (y-axis) for two different dimension selection methods (color/shape). Error bars are 95% bootstrapped CIs.
  • Figure 5: Proportion of low-quality shares (y-axis) across bins (n=10) of projections onto the partisanship (x-axis) dimension, estimated using two different methods (color). Error bars, while small, are present in the figure, and represent 95% normal CIs.
  • ...and 6 more figures