Table of Contents
Fetching ...

The Good, the Bad, and the (Un)Usable: A Rapid Literature Review on Privacy as Code

Nicolás E. Díaz Ferreyra, Sirine Khelifi, Nalin Arachchilage, Riccardo Scandariato

TL;DR

The paper addresses the gap between privacy requirements and code-level implementations by conducting a rapid literature review of Privacy as Code (PaC) methods and tools. It maps the current landscape, revealing that PaC research is nascent, dominated by code-analysis approaches, and hampered byLimited evaluation datasets and cross-language coverage. The study identifies two PaC code-generation techniques and various technical foundations (e.g., ASTs, CPGs, taint analysis), while highlighting challenges such as scalability, false positives, and ad-hoc evaluations. It argues for ground-truth benchmark datasets, open-source collaboration, practitioner-focused empirical studies, and cautious exploration of generative AI to advance PaC responsibly.

Abstract

Privacy and security are central to the design of information systems endowed with sound data protection and cyber resilience capabilities. Still, developers often struggle to incorporate these properties into software projects as they either lack proper cybersecurity training or do not consider them a priority. Prior work has tried to support privacy and security engineering activities through threat modeling methods for scrutinizing flaws in system architectures. Moreover, several techniques for the automatic identification of vulnerabilities and the generation of secure code implementations have also been proposed in the current literature. Conversely, such as-code approaches seem under-investigated in the privacy domain, with little work elaborating on (i) the automatic detection of privacy properties in source code or (ii) the generation of privacy-friendly code. In this work, we seek to characterize the current research landscape of Privacy as Code (PaC) methods and tools by conducting a rapid literature review. Our results suggest that PaC research is in its infancy, especially regarding the performance evaluation and usability assessment of the existing approaches. Based on these findings, we outline and discuss prospective research directions concerning empirical studies with software practitioners, the curation of benchmark datasets, and the role of generative AI technologies.

The Good, the Bad, and the (Un)Usable: A Rapid Literature Review on Privacy as Code

TL;DR

The paper addresses the gap between privacy requirements and code-level implementations by conducting a rapid literature review of Privacy as Code (PaC) methods and tools. It maps the current landscape, revealing that PaC research is nascent, dominated by code-analysis approaches, and hampered byLimited evaluation datasets and cross-language coverage. The study identifies two PaC code-generation techniques and various technical foundations (e.g., ASTs, CPGs, taint analysis), while highlighting challenges such as scalability, false positives, and ad-hoc evaluations. It argues for ground-truth benchmark datasets, open-source collaboration, practitioner-focused empirical studies, and cautious exploration of generative AI to advance PaC responsibly.

Abstract

Privacy and security are central to the design of information systems endowed with sound data protection and cyber resilience capabilities. Still, developers often struggle to incorporate these properties into software projects as they either lack proper cybersecurity training or do not consider them a priority. Prior work has tried to support privacy and security engineering activities through threat modeling methods for scrutinizing flaws in system architectures. Moreover, several techniques for the automatic identification of vulnerabilities and the generation of secure code implementations have also been proposed in the current literature. Conversely, such as-code approaches seem under-investigated in the privacy domain, with little work elaborating on (i) the automatic detection of privacy properties in source code or (ii) the generation of privacy-friendly code. In this work, we seek to characterize the current research landscape of Privacy as Code (PaC) methods and tools by conducting a rapid literature review. Our results suggest that PaC research is in its infancy, especially regarding the performance evaluation and usability assessment of the existing approaches. Based on these findings, we outline and discuss prospective research directions concerning empirical studies with software practitioners, the curation of benchmark datasets, and the role of generative AI technologies.

Paper Structure

This paper contains 13 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Applied Methodology: Rapid Literature Review.
  • Figure 2: Final search query.
  • Figure 3: Evaluation strategies of PaC approaches.
  • Figure :