Comparing Methods for Creating a National Random Sample of Twitter Users

Meysam Alizadeh; Darya Zare; Zeynab Samei; Mohammadamin Alizadeh; Mael Kubli; Mohammadhadi Aliahmadi; Sarvenaz Ebrahimi; Fabrizio Gilardi

Comparing Methods for Creating a National Random Sample of Twitter Users

Meysam Alizadeh, Darya Zare, Zeynab Samei, Mohammadamin Alizadeh, Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi, Fabrizio Gilardi

TL;DR

This study systematically compares four common methods for constructing a national random sample of Twitter users in the US, evaluating tweet-, user-, and population-level representativeness. Using a month-long data collection and a debiasing framework based on inclusion probabilities, the authors demonstrate that the 1% Streaming method most effectively yields population-representative samples, with Bounding Box serving as a viable fallback when streaming is not feasible. Across extensive robustness checks, the 1% Stream consistently achieves lower population-inference error (MAPE) than the other methods, even after accounting for demographic correlations. The work provides practical guidance for researchers conducting population-level Twitter analyses and highlights tradeoffs related to timeliness, engagement metrics, and regional biases. Its approach and findings can inform similar sampling and debiasing efforts on other social platforms and domains.

Abstract

Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a random sample of Twitter users in the US: 1% Stream, Bounding Box, Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1% Stream method performs differently than others in tweet- and user-level metrics, and best for the construction of a population representative sample. We discuss the conditions under which the 1% Stream method may not be suitable and suggest the Bounding Box method as the second-best method to use.

Comparing Methods for Creating a National Random Sample of Twitter Users

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 11 figures, 9 tables)

This paper contains 24 sections, 1 equation, 11 figures, 9 tables.

Introduction
Related Work
Methodology
Sampling Methods
Data
User Pre-Processing
Inferring Users' Demographics
Creating Representative Population Estimates
Evaluation Metrics
Results
Tweet-Level and User-Level Metrics
Population-Level Metrics
Robustness Tests
Discussion
Acknowledgment
...and 9 more sections

Figures (11)

Figure 1: Distributions of (A) number of tweets; (B) average number of tweets per day; (C) number of likes; (D) account creation date; (E) number of followers; and (F) number of friends for different groups. Distribution of users with respect to (G) gender and (H) age across the four Twitter sampling methods.
Figure 2: Heatmap of P-values for Pairwise T-tests of (A) number of tweets; (B) average number of tweets per day; (C) number of likes; (D) account creation date; (E) number of followers; and (F) number of friends across the four Twitter sampling methods.
Figure 3: Map of the number of users located in US states. All four sampling methods produced at least one user in all 50 US states.
Figure 4: Performance on leave one state out population inference across different debiasing models where rows with all zero value were removed from the regression. The bar shows MAPE($N$) robust standard errors clustered on states.
Figure S1: Performance on leave one state out population inference across different debiasing models, where the District of Columbia is included and rows with all zero value were removed from the regression. The bar shows MAPE($N$) robust standard errors clustered on states.
...and 6 more figures

Comparing Methods for Creating a National Random Sample of Twitter Users

TL;DR

Abstract

Comparing Methods for Creating a National Random Sample of Twitter Users

Authors

TL;DR

Abstract

Table of Contents

Figures (11)