Table of Contents
Fetching ...

An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software

Lucas Franke, Huayu Liang, Sahar Farzanehpour, Aaron Brantly, James C. Davis, Chris Brown

TL;DR

This study investigates how the General Data Protection Regulation (GDPR) affects open-source software (OSS) development by combining an online survey (N=56) with a large-scale GitHub pull request analysis (N=31,462 GDPR-related PRs and matched non-GDPR PRs). It finds that GDPR compliance substantially increases development activity (more comments, longer review cycles, and larger code changes) and is associated with predominantly negative developer perceptions due to data-management, time, and cost burdens, though some acknowledge privacy benefits. The authors discuss challenges in implementing and verifying GDPR—such as vague requirements and limited engagement with legal experts—and argue for policy resources and automated tooling to support OSS teams. The work provides actionable guidance for policymakers and software practitioners to improve data-privacy regulation implementation and its practical integration into OSS development processes.

Abstract

Background: Governments worldwide are considering data privacy regulations. These laws, e.g. the European Union's General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users' data. Prior research describes the impact of such laws on software development, but only for commercial software. Open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance. We do not know how such laws impact open-source software development. Aims: To understand how data privacy laws affect open-source software development. We studied the European Union's GDPR, the most prominent such law. We investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4). Method: We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). We further conducted a repository mining study to analyze development metrics on pull requests (N=31462) submitted to open-source GitHub repositories. Results: GDPR policies complicate open-source development processes and introduce challenges for developers, primarily regarding the management of users' data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of GDPR from open-source developers and significant increases in development activity, in particular metrics related to coding and reviewing activity, on GitHub pull requests related to GDPR compliance. Conclusions: Our findings motivate policy-related resources and automated tools to support data privacy regulation implementation and compliance efforts in open-source software.

An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software

TL;DR

This study investigates how the General Data Protection Regulation (GDPR) affects open-source software (OSS) development by combining an online survey (N=56) with a large-scale GitHub pull request analysis (N=31,462 GDPR-related PRs and matched non-GDPR PRs). It finds that GDPR compliance substantially increases development activity (more comments, longer review cycles, and larger code changes) and is associated with predominantly negative developer perceptions due to data-management, time, and cost burdens, though some acknowledge privacy benefits. The authors discuss challenges in implementing and verifying GDPR—such as vague requirements and limited engagement with legal experts—and argue for policy resources and automated tooling to support OSS teams. The work provides actionable guidance for policymakers and software practitioners to improve data-privacy regulation implementation and its practical integration into OSS development processes.

Abstract

Background: Governments worldwide are considering data privacy regulations. These laws, e.g. the European Union's General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users' data. Prior research describes the impact of such laws on software development, but only for commercial software. Open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance. We do not know how such laws impact open-source software development. Aims: To understand how data privacy laws affect open-source software development. We studied the European Union's GDPR, the most prominent such law. We investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4). Method: We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). We further conducted a repository mining study to analyze development metrics on pull requests (N=31462) submitted to open-source GitHub repositories. Results: GDPR policies complicate open-source development processes and introduce challenges for developers, primarily regarding the management of users' data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of GDPR from open-source developers and significant increases in development activity, in particular metrics related to coding and reviewing activity, on GitHub pull requests related to GDPR compliance. Conclusions: Our findings motivate policy-related resources and automated tools to support data privacy regulation implementation and compliance efforts in open-source software.
Paper Structure (41 sections, 1 figure, 5 tables)

This paper contains 41 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Longitudinal GDPR (G) and Non-GDPR (non-G) Sentiment Analysis Data. We grouped GDPR and non-GDPR data into 3-month segments and used 3 sentiment models. For each model, GDPR data is plotted in a color with a filled marker, and non-GDPR data in the same color but with a hollow marker. The general trend is that sentiment for GDPR data is moderately positive, and more positive than for non-GDPR data.