Table of Contents
Fetching ...

PolicyLR: A Logic Representation For Privacy Policies

Ashish Hooda, Rishabh Khandelwal, Prasad Chalasani, Kassem Fawaz, Somesh Jha

TL;DR

PolicyLR presents a logic-based, machine-readable representation of privacy policies built from valuations over atomic formulae, enabling a unified framework for compliance, consistency, and privacy comparison tasks. It introduces a two-stage compiler that translates unstructured policy text into a truth-table over atomic formulae using open-source LLMs and NLI-grounded entailment, with retrieval grounding to handle long documents. The approach demonstrates strong performance on the ToS;DR entailment dataset (precision 0.91, recall 0.88) and competitive/strong results on compliance (average F1 ≈ 0.90) and consistency analyses across policy versions and apps. This framework offers interpretable, context-aware privacy policy analysis and enables practical applications like privacy-aware shopping and regulatory auditing, while outlining limitations and avenues for taxonomy and model improvements.

Abstract

Privacy policies are crucial in the online ecosystem, defining how services handle user data and adhere to regulations such as GDPR and CCPA. However, their complexity and frequent updates often make them difficult for stakeholders to understand and analyze. Current automated analysis methods, which utilize natural language processing, have limitations. They typically focus on individual tasks and fail to capture the full context of the policies. We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies, serving as an all-in-one solution for multiple downstream tasks. PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae, allowing for formal definitions of tasks like compliance and consistency. We have developed a compiler that transforms unstructured policy text into this format using off-the-shelf Large Language Models (LLMs). This compiler breaks down the transformation task into a two-stage translation and entailment procedure. This procedure considers the full context of the privacy policy to infer a complex formula, where each formula consists of simpler atomic formulae. The advantage of this model is that PolicyLR is interpretable by design and grounded in segments of the privacy policy. We evaluated the compiler using ToS;DR, a community-annotated privacy policy entailment dataset. Utilizing open-source LLMs, our compiler achieves precision and recall values of 0.91 and 0.88, respectively. Finally, we demonstrate the utility of PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection, and Privacy Comparison Shopping.

PolicyLR: A Logic Representation For Privacy Policies

TL;DR

PolicyLR presents a logic-based, machine-readable representation of privacy policies built from valuations over atomic formulae, enabling a unified framework for compliance, consistency, and privacy comparison tasks. It introduces a two-stage compiler that translates unstructured policy text into a truth-table over atomic formulae using open-source LLMs and NLI-grounded entailment, with retrieval grounding to handle long documents. The approach demonstrates strong performance on the ToS;DR entailment dataset (precision 0.91, recall 0.88) and competitive/strong results on compliance (average F1 ≈ 0.90) and consistency analyses across policy versions and apps. This framework offers interpretable, context-aware privacy policy analysis and enables practical applications like privacy-aware shopping and regulatory auditing, while outlining limitations and avenues for taxonomy and model improvements.

Abstract

Privacy policies are crucial in the online ecosystem, defining how services handle user data and adhere to regulations such as GDPR and CCPA. However, their complexity and frequent updates often make them difficult for stakeholders to understand and analyze. Current automated analysis methods, which utilize natural language processing, have limitations. They typically focus on individual tasks and fail to capture the full context of the policies. We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies, serving as an all-in-one solution for multiple downstream tasks. PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae, allowing for formal definitions of tasks like compliance and consistency. We have developed a compiler that transforms unstructured policy text into this format using off-the-shelf Large Language Models (LLMs). This compiler breaks down the transformation task into a two-stage translation and entailment procedure. This procedure considers the full context of the privacy policy to infer a complex formula, where each formula consists of simpler atomic formulae. The advantage of this model is that PolicyLR is interpretable by design and grounded in segments of the privacy policy. We evaluated the compiler using ToS;DR, a community-annotated privacy policy entailment dataset. Utilizing open-source LLMs, our compiler achieves precision and recall values of 0.91 and 0.88, respectively. Finally, we demonstrate the utility of PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection, and Privacy Comparison Shopping.
Paper Structure (33 sections, 10 equations, 2 figures, 5 tables)

This paper contains 33 sections, 10 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: End to end pipeline for . We first instantiate 's atomic formulae using the OPP-115 taxonomy. Each combination of attribute-value pairing becomes an atomic formula. The translation module then transforms each of these into natural language statements. Statements are then compared against the privacy policy text by the entailment module to generate 's truth table.
  • Figure 2: Number of atomic formulae where the valuation is different between historical and current version of privacy policies of 31 Google Play Apps. The large number of changes are due to the introduction of the GDPR regulations. provides a way to perform a fine-grained analysis of the evolution of privacy policies in response to new regulations.