Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
Yuxin Chen, Peng Tang, Weidong Qiu, Shujun Li
TL;DR
This paper investigates automated privacy policy analysis using large language models (LLMs) by combining prompt engineering and LoRA fine-tuning across four corpora with hierarchical taxonomies. It demonstrates that the hybrid approach achieves state-of-the-art performance for privacy policy concept classification and provides high-quality explainability, measured by completeness, logicality, and comprehensibility (averaging over $>91.1\%$ in human evaluations). The work offers a practical pathway for accurate, explainable privacy policy analysis and lays groundwork for downstream tasks such as reader-friendly summaries and regulatory compliance checks. Limitations include prompt-based performance gaps, resource constraints, and the need for ongoing exploration of continual pre-training and larger models, which the authors intend to address in future work.
Abstract
Privacy policies are widely used by digital services and often required for legal purposes. Many machine learning based classifiers have been developed to automate detection of different concepts in a given privacy policy, which can help facilitate other automated tasks such as producing a more reader-friendly summary and detecting legal compliance issues. Despite the successful applications of large language models (LLMs) to many NLP tasks in various domains, there is very little work studying the use of LLMs for automated privacy policy analysis, therefore, if and how LLMs can help automate privacy policy analysis remains under-explored. To fill this research gap, we conducted a comprehensive evaluation of LLM-based privacy policy concept classifiers, employing both prompt engineering and LoRA (low-rank adaptation) fine-tuning, on four state-of-the-art (SOTA) privacy policy corpora and taxonomies. Our experimental results demonstrated that combining prompt engineering and fine-tuning can make LLM-based classifiers outperform other SOTA methods, \emph{significantly} and \emph{consistently} across privacy policy corpora/taxonomies and concepts. Furthermore, we evaluated the explainability of the LLM-based classifiers using three metrics: completeness, logicality, and comprehensibility. For all three metrics, a score exceeding 91.1\% was observed in our evaluation, indicating that LLMs are not only useful to improve the classification performance, but also to enhance the explainability of detection results.
