Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

Jake Chanenson; Madison Pickering; Noah Apthorpe

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

Jake Chanenson, Madison Pickering, Noah Apthorpe

TL;DR

This work demonstrates that high-accuracy GKC-CI parameter annotations of privacy policies can be automated with fine-tuned large language models, achieving $90.65\%$ exact-match accuracy on a ground-truth set and enabling large-scale longitudinal and cross-industry analyses. Using LoRA-based PEFT across 50 models and a carefully designed sentence-level prompting regime, the authors show that open-source models struggle without fine-tuning, while a GPT-3.5 Turbo variant trained for 25 epochs attains strong performance and cost efficiency. The approach yields scalable insights into policy evolution, parameter-type variance, and density, and is complemented by a visualization tool and freely available data, code, and annotations to support future GKC-CI research. The work also highlights practical considerations, such as model alignment, library defaults, context-window limitations, and the potential for extending to non-policy documents, setting a path for normative privacy analysis at scale.

Abstract

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

TL;DR

This work demonstrates that high-accuracy GKC-CI parameter annotations of privacy policies can be automated with fine-tuned large language models, achieving

exact-match accuracy on a ground-truth set and enabling large-scale longitudinal and cross-industry analyses. Using LoRA-based PEFT across 50 models and a carefully designed sentence-level prompting regime, the authors show that open-source models struggle without fine-tuning, while a GPT-3.5 Turbo variant trained for 25 epochs attains strong performance and cost efficiency. The approach yields scalable insights into policy evolution, parameter-type variance, and density, and is complemented by a visualization tool and freely available data, code, and annotations to support future GKC-CI research. The work also highlights practical considerations, such as model alignment, library defaults, context-window limitations, and the potential for extending to non-policy documents, setting a path for normative privacy analysis at scale.

Abstract

Paper Structure (49 sections, 1 equation, 17 figures, 11 tables)

This paper contains 49 sections, 1 equation, 17 figures, 11 tables.

Introduction
Related Work
The Usable Privacy Project
Manual CI Annotation
Privacy Policy Analysis With Machine Learning
GKC-CI Theory
Methods
Training and Testing Data
Formatting Examples
OpenAI Models
Model Selection
Baseline Models
Fine-Tuned Models
Model Training
Model Performance
...and 34 more sections

Figures (17)

Figure 1: Test set performance of the top-performing models variants, including the RNN, with $\leq$ 10 epoch of training. GPT3,5_TPE refers to the prompt-engineered version of GPT-3.5 Turbo, GPT3,5_TG refers the generic GPT-3.5 Turbo model, and GPT3,5_t2s refers to the joint performance of the GPT-3.5 Turbo, 2-Step models. Expanded model names in Appendix \ref{['sec:appendix:name_mapping']}.
Figure 2: Performance per GKC-CI parameter for our best performing model, GPT 3.5TPE_25ep.
Figure 3: GPT 3.5TPE's performance on the test set at 1, 5, 10, 25, and 50 epochs. Only Perfect Matches were considered to be "correct."
Figure 5: Breakdown by parent code of the various types of errors found from our qualitative analysis.
Figure 6: The 15 privacy policies with the highest variance in the percentage of individual parameter types across all parameters annotated in the policy.
...and 12 more figures

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

TL;DR

Abstract

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (17)