Table of Contents
Fetching ...

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

Jake Chanenson, Madison Pickering, Noah Apthorpe

TL;DR

This work demonstrates that high-accuracy GKC-CI parameter annotations of privacy policies can be automated with fine-tuned large language models, achieving $90.65\%$ exact-match accuracy on a ground-truth set and enabling large-scale longitudinal and cross-industry analyses. Using LoRA-based PEFT across 50 models and a carefully designed sentence-level prompting regime, the authors show that open-source models struggle without fine-tuning, while a GPT-3.5 Turbo variant trained for 25 epochs attains strong performance and cost efficiency. The approach yields scalable insights into policy evolution, parameter-type variance, and density, and is complemented by a visualization tool and freely available data, code, and annotations to support future GKC-CI research. The work also highlights practical considerations, such as model alignment, library defaults, context-window limitations, and the potential for extending to non-policy documents, setting a path for normative privacy analysis at scale.

Abstract

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

TL;DR

This work demonstrates that high-accuracy GKC-CI parameter annotations of privacy policies can be automated with fine-tuned large language models, achieving exact-match accuracy on a ground-truth set and enabling large-scale longitudinal and cross-industry analyses. Using LoRA-based PEFT across 50 models and a carefully designed sentence-level prompting regime, the authors show that open-source models struggle without fine-tuning, while a GPT-3.5 Turbo variant trained for 25 epochs attains strong performance and cost efficiency. The approach yields scalable insights into policy evolution, parameter-type variance, and density, and is complemented by a visualization tool and freely available data, code, and annotations to support future GKC-CI research. The work also highlights practical considerations, such as model alignment, library defaults, context-window limitations, and the potential for extending to non-policy documents, setting a path for normative privacy analysis at scale.

Abstract

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.
Paper Structure (49 sections, 1 equation, 17 figures, 11 tables)

This paper contains 49 sections, 1 equation, 17 figures, 11 tables.

Figures (17)

  • Figure 1: Test set performance of the top-performing models variants, including the RNN, with $\leq$ 10 epoch of training. GPT3,5_TPE refers to the prompt-engineered version of GPT-3.5 Turbo, GPT3,5_TG refers the generic GPT-3.5 Turbo model, and GPT3,5_t2s refers to the joint performance of the GPT-3.5 Turbo, 2-Step models. Expanded model names in Appendix \ref{['sec:appendix:name_mapping']}.
  • Figure 2: Performance per GKC-CI parameter for our best performing model, GPT 3.5TPE_25ep.
  • Figure 3: GPT 3.5TPE's performance on the test set at 1, 5, 10, 25, and 50 epochs. Only Perfect Matches were considered to be "correct."
  • Figure 5: Breakdown by parent code of the various types of errors found from our qualitative analysis.
  • Figure 6: The 15 privacy policies with the highest variance in the percentage of individual parameter types across all parameters annotated in the policy.
  • ...and 12 more figures