Table of Contents
Fetching ...

WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp

Kiran Garimella, Simon Chauchard

TL;DR

This paper tackles the challenge of acquiring large-scale WhatsApp data for social science research, particularly in the Global South, where WhatsApp dominates messaging. It proposes WhatsApp Explorer, a privacy-conscious donation tool and protocol that relies on gateway-user consent, automated anonymization, and in-person data collection to enable large-scale yet ethical data collection from group chats. The authors detail a technical stack (whatsapp-web.js, NodeJS), data visualization pipelines, and robust anonymization processes (including a Systematic Anonymization Audit) to protect privacy while enabling analysis of content spread and group dynamics. They also discuss sampling strategies, highlighting the impracticality of pure probability sampling and presenting decentralized quota sampling as a feasible alternative, supported by pilot studies in India and Brazil. Overall, the work aims to establish practical, privacy-first standards for WhatsApp data research and provoke ongoing discussion about ethical data collection in high-privacy environments.

Abstract

In recent years, reports and anecdotal evidence pointing at the role of WhatsApp in a variety of events, ranging from elections to collective violence, have emerged. While academic research should examine the validity of these claims, obtaining WhatsApp data for research is notably challenging, contrasting with the relative abundance of data from platforms like Facebook and Twitter, where user "information diets" have been extensively studied. This lack of data is particularly problematic since misinformation and hate speech are major concerns in the set of Global South countries in which WhatsApp dominates the market for messaging. To help make research on these questions, and more generally research on WhatsApp, possible, this paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale. We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.

WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp

TL;DR

This paper tackles the challenge of acquiring large-scale WhatsApp data for social science research, particularly in the Global South, where WhatsApp dominates messaging. It proposes WhatsApp Explorer, a privacy-conscious donation tool and protocol that relies on gateway-user consent, automated anonymization, and in-person data collection to enable large-scale yet ethical data collection from group chats. The authors detail a technical stack (whatsapp-web.js, NodeJS), data visualization pipelines, and robust anonymization processes (including a Systematic Anonymization Audit) to protect privacy while enabling analysis of content spread and group dynamics. They also discuss sampling strategies, highlighting the impracticality of pure probability sampling and presenting decentralized quota sampling as a feasible alternative, supported by pilot studies in India and Brazil. Overall, the work aims to establish practical, privacy-first standards for WhatsApp data research and provoke ongoing discussion about ethical data collection in high-privacy environments.

Abstract

In recent years, reports and anecdotal evidence pointing at the role of WhatsApp in a variety of events, ranging from elections to collective violence, have emerged. While academic research should examine the validity of these claims, obtaining WhatsApp data for research is notably challenging, contrasting with the relative abundance of data from platforms like Facebook and Twitter, where user "information diets" have been extensively studied. This lack of data is particularly problematic since misinformation and hate speech are major concerns in the set of Global South countries in which WhatsApp dominates the market for messaging. To help make research on these questions, and more generally research on WhatsApp, possible, this paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale. We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.
Paper Structure (17 sections, 9 figures)

This paper contains 17 sections, 9 figures.

Figures (9)

  • Figure 1: A screenshot of the "Fowarded" tab in our visualization dashboard. This tab shows text, video, and images which were 'forwarded many times' on WhatsApp. We can see functionality to filter messages by date and search for text in images/videos.
  • Figure 2: Upon clicking a piece of content, this view shows the groups in which the content was shared along with their timestamps, giving a quick view of the spread of content.
  • Figure 6: Group metrics
  • Figure 7: Size of Donated Groups in India and Brazil
  • Figure 8: Number of donated groups by age categories
  • ...and 4 more figures