PolicyLR: A Logic Representation For Privacy Policies
Ashish Hooda, Rishabh Khandelwal, Prasad Chalasani, Kassem Fawaz, Somesh Jha
TL;DR
PolicyLR presents a logic-based, machine-readable representation of privacy policies built from valuations over atomic formulae, enabling a unified framework for compliance, consistency, and privacy comparison tasks. It introduces a two-stage compiler that translates unstructured policy text into a truth-table over atomic formulae using open-source LLMs and NLI-grounded entailment, with retrieval grounding to handle long documents. The approach demonstrates strong performance on the ToS;DR entailment dataset (precision 0.91, recall 0.88) and competitive/strong results on compliance (average F1 ≈ 0.90) and consistency analyses across policy versions and apps. This framework offers interpretable, context-aware privacy policy analysis and enables practical applications like privacy-aware shopping and regulatory auditing, while outlining limitations and avenues for taxonomy and model improvements.
Abstract
Privacy policies are crucial in the online ecosystem, defining how services handle user data and adhere to regulations such as GDPR and CCPA. However, their complexity and frequent updates often make them difficult for stakeholders to understand and analyze. Current automated analysis methods, which utilize natural language processing, have limitations. They typically focus on individual tasks and fail to capture the full context of the policies. We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies, serving as an all-in-one solution for multiple downstream tasks. PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae, allowing for formal definitions of tasks like compliance and consistency. We have developed a compiler that transforms unstructured policy text into this format using off-the-shelf Large Language Models (LLMs). This compiler breaks down the transformation task into a two-stage translation and entailment procedure. This procedure considers the full context of the privacy policy to infer a complex formula, where each formula consists of simpler atomic formulae. The advantage of this model is that PolicyLR is interpretable by design and grounded in segments of the privacy policy. We evaluated the compiler using ToS;DR, a community-annotated privacy policy entailment dataset. Utilizing open-source LLMs, our compiler achieves precision and recall values of 0.91 and 0.88, respectively. Finally, we demonstrate the utility of PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection, and Privacy Comparison Shopping.
