Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models
Thomas Cory, Wolf Rieder, Julia Krämer, Philip Raschke, Patrick Herbke, Axel Küpper
TL;DR
This work tackles the challenge of automating fine-grained GDPR transparency annotation in privacy policies by proposing a modular LLM-based pipeline that combines passage-level classification, retrieval-augmented generation (RAG), and a self-correction layer to produce word- and phrase-level annotations aligned with Articles 13 and 14. It defines a 21-item GDPR transparency annotation scheme, builds a large-scale privacy policy corpus (703,791 English policies) and a GDPR-aligned ground-truth set (GDPR-Transparency-200), and conducts a two-tier evaluation across seven LLMs on GDPR-Transparency-200 and OPP-115. Results show that decomposing the task and augmenting prompts with targeted retrieval significantly improve both passage- and span-level accuracy, with GPT-4.1 delivering the best performance on the GDPR-centric dataset; however, challenges persist for long or ambiguous spans and ground-truth inconsistencies, indicating that expert human oversight remains essential. The study provides empirical resources and a principled framework for scalable, automated, policy-level compliance analysis and outlines practical directions for improving reliability, generalizability, and annotation standards.
Abstract
Ensuring transparency of data practices related to personal information is a core requirement of the General Data Protection Regulation (GDPR). However, large-scale compliance assessment remains challenging due to the complexity and diversity of privacy policy language. Manual audits are labour-intensive and inconsistent, while current automated methods often lack the granularity required to capture nuanced transparency disclosures. In this paper, we present a modular large language model (LLM)-based pipeline for fine-grained word-level annotation of privacy policies with respect to GDPR transparency requirements. Our approach integrates LLM-driven annotation with passage-level classification, retrieval-augmented generation, and a self-correction mechanism to deliver scalable, context-aware annotations across 21 GDPR-derived transparency requirements. To support empirical evaluation, we compile a corpus of 703,791 English-language privacy policies and generate a ground-truth sample of 200 manually annotated policies based on a comprehensive, GDPR-aligned annotation scheme. We propose a two-tiered evaluation methodology capturing both passage-level classification and span-level annotation quality and conduct a comparative analysis of seven state-of-the-art LLMs on two annotation schemes, including the widely used OPP-115 dataset. The results of our evaluation show that decomposing the annotation task and integrating targeted retrieval and classification components significantly improve annotation accuracy, particularly for well-structured requirements. Our work provides new empirical resources and methodological foundations for advancing automated transparency compliance assessment at scale.
