Designing and Testing a Mobile Application for Collecting WhatsApp Chat Data While Preserving Privacy
Brennan Schaffner, Archie Brohn, Jason Chee, K. J. Feng, Marshini Chetty
TL;DR
The paper tackles privacy concerns in WhatsApp data research by proposing User-Centered Data Sharing (UCDS), a set of principles that limit data collection to necessary metadata, perform local extraction, involve users in oversight, and ensure transparency. It implements a concrete instance (URL-EXTRACTOR-APP) to assess feasibility with 10 participants and conducts a large user survey (n=334) to gauge perceptions, finding that UCDS can yield useful insights while better aligning with user privacy expectations. Key contributions include a concrete privacy-preserving data collection methodology, empirical evidence that users prefer metadata- and URL-based extraction with local processing, and recommendations for improving group-consent mechanisms in data-sharing research. The work advances usable privacy practices in social-media data collection and suggests broader adoption of two-sided data sharing and governance models in research contexts.
Abstract
It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and oversight of data collection and taking care to minimize the data researchers collect and process off a user's device. We refer to these design principles as User-Centered Data Sharing (UCDS). To evaluate our UCDS principles, we implemented a mobile application representing one possible instance of these improved data collection techniques and evaluated the viability of using the app to collect WhatsApp chat data. Second, we surveyed WhatsApp users to gather user perceptions on common existing WhatsApp data collection methods as well as UCDS methods. Our results show that we were able to glean similar informative insights into WhatsApp chats using UCDS principles in our prototype app to common, less privacy-preserving methods. Our survey showed that methods following the UCDS principles are preferred by users because they offered users more control over the data collection process. Future user studies could further expand upon UCDS principles to overcome complications of researcher-to-group communication in research on WhatsApp chats and evaluate these principles in other data sharing contexts.
