Table of Contents
Fetching ...

WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?

Ali Akbar Septiandri, Marios Constantinides, Daniele Quercia

TL;DR

This study investigates WEIRD bias in ICWSM social computing research by analyzing 420 ICWSM papers (2018–2022) after filtering, and by adapting WEIRD metrics to social media data. Using a crowdsourced annotation process and Kendall rank correlations, the authors quantify country-level representation across datasets and examine links to authorship diversity. They find that 37% of ICWSM papers rely exclusively on Western datasets, a lower share than CHI or FAccT, yet the literature remains skewed toward Educated, Industrialized, and Rich contexts, with Democratic representation also uneven. The paper further shows that dataset/poster tracks exhibit lower Educated and Democratic scores than full papers, and that cross-country collaborations tend to correlate with samples from less democratic countries. To mitigate WEIRD bias, the authors propose expanding paper checklists, adding responsible AI statements, and promoting author diversity and shadow mentoring across regions.

Abstract

Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.

WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?

TL;DR

This study investigates WEIRD bias in ICWSM social computing research by analyzing 420 ICWSM papers (2018–2022) after filtering, and by adapting WEIRD metrics to social media data. Using a crowdsourced annotation process and Kendall rank correlations, the authors quantify country-level representation across datasets and examine links to authorship diversity. They find that 37% of ICWSM papers rely exclusively on Western datasets, a lower share than CHI or FAccT, yet the literature remains skewed toward Educated, Industrialized, and Rich contexts, with Democratic representation also uneven. The paper further shows that dataset/poster tracks exhibit lower Educated and Democratic scores than full papers, and that cross-country collaborations tend to correlate with samples from less democratic countries. To mitigate WEIRD bias, the authors propose expanding paper checklists, adding responsible AI statements, and promoting author diversity and shadow mentoring across regions.

Abstract

Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.
Paper Structure (24 sections, 2 figures, 7 tables)

This paper contains 24 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Crowdsourcing setup.a) We developed a web app that showed the set of ICWSM papers; b) Each crowdworker, having been granted access to the web app through Prolific, annotated five papers. Three example papers were provided to familiarize with the task at hand; and c) Based on the obtained data, we computed the WEIRD scores.
  • Figure 2: Paper distribution ratio, $\psi_{c}$, reflects the extent of over-representation ($\psi_{c} > 1$) and under-representation ($\psi_{c} < 1$) of countries in ICWSM papers from 2018 to 2022. This ratio is calculated by summing the fractional contributions of each country $c$ across all papers and normalizing it by the population of $c$. Countries not included in the ICWSM papers during the period under study are shown in light gray ($\psi_{c} = 0$), whereas darker shades of blue and red depict countries that are under-represented and over-represented, respectively.