Table of Contents
Fetching ...

Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models

Thomas Cory, Wolf Rieder, Julia Krämer, Philip Raschke, Patrick Herbke, Axel Küpper

TL;DR

This work tackles the challenge of automating fine-grained GDPR transparency annotation in privacy policies by proposing a modular LLM-based pipeline that combines passage-level classification, retrieval-augmented generation (RAG), and a self-correction layer to produce word- and phrase-level annotations aligned with Articles 13 and 14. It defines a 21-item GDPR transparency annotation scheme, builds a large-scale privacy policy corpus (703,791 English policies) and a GDPR-aligned ground-truth set (GDPR-Transparency-200), and conducts a two-tier evaluation across seven LLMs on GDPR-Transparency-200 and OPP-115. Results show that decomposing the task and augmenting prompts with targeted retrieval significantly improve both passage- and span-level accuracy, with GPT-4.1 delivering the best performance on the GDPR-centric dataset; however, challenges persist for long or ambiguous spans and ground-truth inconsistencies, indicating that expert human oversight remains essential. The study provides empirical resources and a principled framework for scalable, automated, policy-level compliance analysis and outlines practical directions for improving reliability, generalizability, and annotation standards.

Abstract

Ensuring transparency of data practices related to personal information is a core requirement of the General Data Protection Regulation (GDPR). However, large-scale compliance assessment remains challenging due to the complexity and diversity of privacy policy language. Manual audits are labour-intensive and inconsistent, while current automated methods often lack the granularity required to capture nuanced transparency disclosures. In this paper, we present a modular large language model (LLM)-based pipeline for fine-grained word-level annotation of privacy policies with respect to GDPR transparency requirements. Our approach integrates LLM-driven annotation with passage-level classification, retrieval-augmented generation, and a self-correction mechanism to deliver scalable, context-aware annotations across 21 GDPR-derived transparency requirements. To support empirical evaluation, we compile a corpus of 703,791 English-language privacy policies and generate a ground-truth sample of 200 manually annotated policies based on a comprehensive, GDPR-aligned annotation scheme. We propose a two-tiered evaluation methodology capturing both passage-level classification and span-level annotation quality and conduct a comparative analysis of seven state-of-the-art LLMs on two annotation schemes, including the widely used OPP-115 dataset. The results of our evaluation show that decomposing the annotation task and integrating targeted retrieval and classification components significantly improve annotation accuracy, particularly for well-structured requirements. Our work provides new empirical resources and methodological foundations for advancing automated transparency compliance assessment at scale.

Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models

TL;DR

This work tackles the challenge of automating fine-grained GDPR transparency annotation in privacy policies by proposing a modular LLM-based pipeline that combines passage-level classification, retrieval-augmented generation (RAG), and a self-correction layer to produce word- and phrase-level annotations aligned with Articles 13 and 14. It defines a 21-item GDPR transparency annotation scheme, builds a large-scale privacy policy corpus (703,791 English policies) and a GDPR-aligned ground-truth set (GDPR-Transparency-200), and conducts a two-tier evaluation across seven LLMs on GDPR-Transparency-200 and OPP-115. Results show that decomposing the task and augmenting prompts with targeted retrieval significantly improve both passage- and span-level accuracy, with GPT-4.1 delivering the best performance on the GDPR-centric dataset; however, challenges persist for long or ambiguous spans and ground-truth inconsistencies, indicating that expert human oversight remains essential. The study provides empirical resources and a principled framework for scalable, automated, policy-level compliance analysis and outlines practical directions for improving reliability, generalizability, and annotation standards.

Abstract

Ensuring transparency of data practices related to personal information is a core requirement of the General Data Protection Regulation (GDPR). However, large-scale compliance assessment remains challenging due to the complexity and diversity of privacy policy language. Manual audits are labour-intensive and inconsistent, while current automated methods often lack the granularity required to capture nuanced transparency disclosures. In this paper, we present a modular large language model (LLM)-based pipeline for fine-grained word-level annotation of privacy policies with respect to GDPR transparency requirements. Our approach integrates LLM-driven annotation with passage-level classification, retrieval-augmented generation, and a self-correction mechanism to deliver scalable, context-aware annotations across 21 GDPR-derived transparency requirements. To support empirical evaluation, we compile a corpus of 703,791 English-language privacy policies and generate a ground-truth sample of 200 manually annotated policies based on a comprehensive, GDPR-aligned annotation scheme. We propose a two-tiered evaluation methodology capturing both passage-level classification and span-level annotation quality and conduct a comparative analysis of seven state-of-the-art LLMs on two annotation schemes, including the widely used OPP-115 dataset. The results of our evaluation show that decomposing the annotation task and integrating targeted retrieval and classification components significantly improve annotation accuracy, particularly for well-structured requirements. Our work provides new empirical resources and methodological foundations for advancing automated transparency compliance assessment at scale.

Paper Structure

This paper contains 37 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Preprocessing pipeline that parses raw HTML documents into lists of annotation-ready passages.
  • Figure 2: Annotation Pipeline comprising LLM-based annotation- and self-correction layers. The annotation layer combines its core LLM-based annotator with an upstream passage-level classifier to produce annotated passages, which are refined by the self-correction layer's LLM-based reviewer. All LLM-based components are supported by a dedicated RAG injector, which dynamically augments their inputs with suitable examples and legal background.
  • Figure 3: Corpus compilation process, showing the output of each step leading to the corpus of privacy policies used as the population from which we draw our evaluation sample.
  • Figure 4: Manual review process.
  • Figure 5: Cumulative annotation span similarity distribution, showing the proportion of annotations with span similarity score $span\theta(t_o, t_{gt})$ below given discrimination thresholds $\tau$ across all models and configurations for each dataset.
  • ...and 2 more figures