WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp
Kiran Garimella, Simon Chauchard
TL;DR
This paper tackles the challenge of acquiring large-scale WhatsApp data for social science research, particularly in the Global South, where WhatsApp dominates messaging. It proposes WhatsApp Explorer, a privacy-conscious donation tool and protocol that relies on gateway-user consent, automated anonymization, and in-person data collection to enable large-scale yet ethical data collection from group chats. The authors detail a technical stack (whatsapp-web.js, NodeJS), data visualization pipelines, and robust anonymization processes (including a Systematic Anonymization Audit) to protect privacy while enabling analysis of content spread and group dynamics. They also discuss sampling strategies, highlighting the impracticality of pure probability sampling and presenting decentralized quota sampling as a feasible alternative, supported by pilot studies in India and Brazil. Overall, the work aims to establish practical, privacy-first standards for WhatsApp data research and provoke ongoing discussion about ethical data collection in high-privacy environments.
Abstract
In recent years, reports and anecdotal evidence pointing at the role of WhatsApp in a variety of events, ranging from elections to collective violence, have emerged. While academic research should examine the validity of these claims, obtaining WhatsApp data for research is notably challenging, contrasting with the relative abundance of data from platforms like Facebook and Twitter, where user "information diets" have been extensively studied. This lack of data is particularly problematic since misinformation and hate speech are major concerns in the set of Global South countries in which WhatsApp dominates the market for messaging. To help make research on these questions, and more generally research on WhatsApp, possible, this paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale. We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.
