Table of Contents
Fetching ...

A Solution toward Transparent and Practical AI Regulation: Privacy Nutrition Labels for Open-source Generative AI-based Applications

Meixue Si, Shidong Pan, Dianshu Liao, Xiaoyu Sun, Zhen Tao, Wenchang Shi, Zhenchang Xing

TL;DR

The paper tackles the lack of privacy transparency in open-source GAI apps by proposing regulation-driven GAI privacy labels and the Repo2Label framework to auto-generate labels from code repositories. It designs a four-section GAI privacy label aligned with GDPR, CCPA, PIPL and GAi-specific regulations, and evaluates label design via a user study, finding strong user endorsement. It demonstrates Repo2Label's effectiveness with GPT-4o and a verification step, achieving a high F1 of 0.84 on a benchmark and outperforming self-declared privacy notices. The work highlights regulatory-compliance gaps in open-source GAI apps and argues that code-based labels can improve transparency, accountability, and user trust across stakeholders.

Abstract

The rapid development and widespread adoption of Generative Artificial Intelligence-based (GAI) applications have greatly enriched our daily lives, benefiting people by enhancing creativity, personalizing experiences, improving accessibility, and fostering innovation and efficiency across various domains. However, along with the development of GAI applications, concerns have been raised about transparency in their privacy practices. Traditional privacy policies often fail to effectively communicate essential privacy information due to their complexity and length, and open-source community developers often neglect privacy practices even more. Only 12.2% of examined open-source GAI apps provide a privacy policy. To address this, we propose a regulation-driven GAI Privacy Label and introduce Repo2Label, a novel framework for automatically generating these labels based on code repositories. Our user study indicates a common endorsement of the proposed GAI privacy label format. Additionally, Repo2Label achieves a precision of 0.81, recall of 0.88, and F1-score of 0.84 based on the benchmark dataset, significantly outperforming the developer self-declared privacy notices. We also discuss the common regulatory (in)compliance of open-source GAI apps, comparison with other privacy notices, and broader impacts to different stakeholders. Our findings suggest that Repo2Label could serve as a significant tool for bolstering the privacy transparency of GAI apps and make them more practical and responsible.

A Solution toward Transparent and Practical AI Regulation: Privacy Nutrition Labels for Open-source Generative AI-based Applications

TL;DR

The paper tackles the lack of privacy transparency in open-source GAI apps by proposing regulation-driven GAI privacy labels and the Repo2Label framework to auto-generate labels from code repositories. It designs a four-section GAI privacy label aligned with GDPR, CCPA, PIPL and GAi-specific regulations, and evaluates label design via a user study, finding strong user endorsement. It demonstrates Repo2Label's effectiveness with GPT-4o and a verification step, achieving a high F1 of 0.84 on a benchmark and outperforming self-declared privacy notices. The work highlights regulatory-compliance gaps in open-source GAI apps and argues that code-based labels can improve transparency, accountability, and user trust across stakeholders.

Abstract

The rapid development and widespread adoption of Generative Artificial Intelligence-based (GAI) applications have greatly enriched our daily lives, benefiting people by enhancing creativity, personalizing experiences, improving accessibility, and fostering innovation and efficiency across various domains. However, along with the development of GAI applications, concerns have been raised about transparency in their privacy practices. Traditional privacy policies often fail to effectively communicate essential privacy information due to their complexity and length, and open-source community developers often neglect privacy practices even more. Only 12.2% of examined open-source GAI apps provide a privacy policy. To address this, we propose a regulation-driven GAI Privacy Label and introduce Repo2Label, a novel framework for automatically generating these labels based on code repositories. Our user study indicates a common endorsement of the proposed GAI privacy label format. Additionally, Repo2Label achieves a precision of 0.81, recall of 0.88, and F1-score of 0.84 based on the benchmark dataset, significantly outperforming the developer self-declared privacy notices. We also discuss the common regulatory (in)compliance of open-source GAI apps, comparison with other privacy notices, and broader impacts to different stakeholders. Our findings suggest that Repo2Label could serve as a significant tool for bolstering the privacy transparency of GAI apps and make them more practical and responsible.
Paper Structure (32 sections, 4 figures, 7 tables)

This paper contains 32 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: An overview of Repo2Label and an example GAI privacy label for https://github.com/CompVis/stable-diffusion. Given a repository (A), Repo2Label extracts all code files (B) and semi-structured textual documents (e.g., C) from the repository. Answers and references are then generated for each label filed in our proposed regulation-driven GAI privacy nutrition labels (D).
  • Figure 2: (a) The position of privacy policies of mobile apps in the Google Play app store. (b) The position of the privacy policy of AutoGPT, one of the GAI apps.
  • Figure 3: The overview of Repo2Label framework.
  • Figure 4: Frequency of the GAI Privacy Label fields marked as "Yes" in the manually annotated dataset.