How to Drill Into Silos: Creating a Free-to-Use Dataset of Data Subject Access Packages
Nicola Leschke, Daniela Pöhn, Frank Pallas
TL;DR
The paper tackles the lack of publicly accessible, multi-provider SARPs to advance ex-post transparency research under GDPR. It proposes a six-step method for generating SARPs using research-only, pseudonymous accounts, with careful de-identification and publication under open licenses. An initial minimal dataset spanning five controllers (GAFA and LinkedIn) and two data subjects is created, described, and preliminarily analyzed to demonstrate cross-controller and cross-subject research potential. The work sets a foundation for scalable, repeatable SARP dataset generation and discusses future expansion to broaden provider coverage, languages, and long-term studies, with practical implications for privacy dashboards and data portability research.
Abstract
The European Union's General Data Protection Regulation (GDPR) strengthened several rights for individuals (data subjects). One of these is the data subjects' right to access their personal data being collected by services (data controllers), complemented with a new right to data portability. Based on these, data controllers are obliged to provide respective data and allow data subjects to use them at their own discretion. However, the subjects' possibilities for actually using and harnessing said data are severely limited so far. Among other reasons, this can be attributed to a lack of research dedicated to the actual use of controller-provided subject access request packages (SARPs). To open up and facilitate such research, we outline a general, high-level method for generating, pre-processing, publishing, and finally using SARPs of different providers. Furthermore, we establish a realistic dataset comprising two users' SARPs from five services. This dataset is publicly provided and shall, in the future, serve as a starting and reference point for researching and comparing novel approaches for the practically viable use of SARPs.
