Table of Contents
Fetching ...

Designing and Testing a Mobile Application for Collecting WhatsApp Chat Data While Preserving Privacy

Brennan Schaffner, Archie Brohn, Jason Chee, K. J. Feng, Marshini Chetty

TL;DR

The paper tackles privacy concerns in WhatsApp data research by proposing User-Centered Data Sharing (UCDS), a set of principles that limit data collection to necessary metadata, perform local extraction, involve users in oversight, and ensure transparency. It implements a concrete instance (URL-EXTRACTOR-APP) to assess feasibility with 10 participants and conducts a large user survey (n=334) to gauge perceptions, finding that UCDS can yield useful insights while better aligning with user privacy expectations. Key contributions include a concrete privacy-preserving data collection methodology, empirical evidence that users prefer metadata- and URL-based extraction with local processing, and recommendations for improving group-consent mechanisms in data-sharing research. The work advances usable privacy practices in social-media data collection and suggests broader adoption of two-sided data sharing and governance models in research contexts.

Abstract

It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and oversight of data collection and taking care to minimize the data researchers collect and process off a user's device. We refer to these design principles as User-Centered Data Sharing (UCDS). To evaluate our UCDS principles, we implemented a mobile application representing one possible instance of these improved data collection techniques and evaluated the viability of using the app to collect WhatsApp chat data. Second, we surveyed WhatsApp users to gather user perceptions on common existing WhatsApp data collection methods as well as UCDS methods. Our results show that we were able to glean similar informative insights into WhatsApp chats using UCDS principles in our prototype app to common, less privacy-preserving methods. Our survey showed that methods following the UCDS principles are preferred by users because they offered users more control over the data collection process. Future user studies could further expand upon UCDS principles to overcome complications of researcher-to-group communication in research on WhatsApp chats and evaluate these principles in other data sharing contexts.

Designing and Testing a Mobile Application for Collecting WhatsApp Chat Data While Preserving Privacy

TL;DR

The paper tackles privacy concerns in WhatsApp data research by proposing User-Centered Data Sharing (UCDS), a set of principles that limit data collection to necessary metadata, perform local extraction, involve users in oversight, and ensure transparency. It implements a concrete instance (URL-EXTRACTOR-APP) to assess feasibility with 10 participants and conducts a large user survey (n=334) to gauge perceptions, finding that UCDS can yield useful insights while better aligning with user privacy expectations. Key contributions include a concrete privacy-preserving data collection methodology, empirical evidence that users prefer metadata- and URL-based extraction with local processing, and recommendations for improving group-consent mechanisms in data-sharing research. The work advances usable privacy practices in social-media data collection and suggests broader adoption of two-sided data sharing and governance models in research contexts.

Abstract

It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and oversight of data collection and taking care to minimize the data researchers collect and process off a user's device. We refer to these design principles as User-Centered Data Sharing (UCDS). To evaluate our UCDS principles, we implemented a mobile application representing one possible instance of these improved data collection techniques and evaluated the viability of using the app to collect WhatsApp chat data. Second, we surveyed WhatsApp users to gather user perceptions on common existing WhatsApp data collection methods as well as UCDS methods. Our results show that we were able to glean similar informative insights into WhatsApp chats using UCDS principles in our prototype app to common, less privacy-preserving methods. Our survey showed that methods following the UCDS principles are preferred by users because they offered users more control over the data collection process. Future user studies could further expand upon UCDS principles to overcome complications of researcher-to-group communication in research on WhatsApp chats and evaluate these principles in other data sharing contexts.
Paper Structure (101 sections, 14 figures, 5 tables)

This paper contains 101 sections, 14 figures, 5 tables.

Figures (14)

  • Figure 1: The process for exporting the text file logs from a WhatsApp chat and an example of the resulting text file. WhatsApp's homepage is shown in (a), where users can select one of their chats. Then in (b) the user can select to export the chat data from the chat's options menu. Finally, the user chooses how to export the chat in (c), linking to other apps that accept text files. The resulting exported chat file is shown in (d).
  • Figure 2: Screenshots of URL-EXTRACTOR-APP's navigation screens. A user could either add an exported WhatsApp chat text file or share the file directly to the app (a). Once the exported chats were shared to or imported into URL-EXTRACTOR-APP, the app displayed a list of all shared chats. It is shown as 'Chat 1' in this example (b). The example metadata for a chat could be verified (c). The list of URLs for the selected chat was displayed as a list with an 'X' next to it (d). Clicking this 'X' deleted the URL from the chat's metadata. When a user clicked 'Add to send', the user could then enter the email addresses for which to send the extracted chat along with the ability to automatically send it to the research team (e).
  • Figure 3: Illustration provided to participants explaining the standard methods for collecting WhatsApp chat data where researchers join WhatsApp chats via invite links found online.
  • Figure 4: Illustration provided to participants explaining the scenario representing UCDS. Specifically, researchers do not join the chats but externally interface with a chat member (e.g., via an application like URL-EXTRACTOR-APP).
  • Figure 5: Participant comfort level sharing different types of WhatsApp chat data, grouped by Metadata, Message Contents, and PII.
  • ...and 9 more figures