A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models
Zhen Tao, Shidong Pan, Zhenchang Xing, Emily Black, Talia Gillis, Chunyang Chen
TL;DR
This study conducts the first longitudinal, cross-provider analysis of privacy policies for mainstream LLM services, assembling 74 historical policy versions and 115 supplemental documents from 11 providers across five countries up to Aug 2025. It introduces an LLM-specific privacy policy taxonomy, measures readability and vagueness, and extracts 3,463 sentence-level edits to quantify content evolution. The findings show LLM policies are substantially longer, harder to read, and remain highly vague, with growth driven by new data types, model-training disclosures, and cross-border data flows, and with updates often aligned to product launches and regulatory actions. The work reveals regional and provider differences, signs of copy-paste-like drafting, and emphasizes the need for governance mechanisms that integrate AI-specific requirements into privacy notices while mitigating cognitive burden for users.
Abstract
Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising privacy concerns around excessive collection of sensitive personal information. Privacy policies are the fundamental mechanism for informing users about data practices in modern information privacy paradigm. Although traditional web and mobile policies are well studied, the privacy policies of LLM providers, their LLM-specific content, and their evolution over time remain largely underexplored. In this paper, we present the first longitudinal empirical study of privacy policies for mainstream LLM providers worldwide. We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025, and extract over 3,000 sentence-level edits between consecutive policy versions. We compare LLM privacy policies to those of other software formats, propose a taxonomy tailored to LLM privacy policies, annotate policy edits and align them with a timeline of key LLM ecosystem events. Results show they are substantially longer, demand college-level reading ability, and remain highly vague. Our taxonomy analysis reveals patterns in how providers disclose LLM-specific practices and highlights regional disparities in coverage. Policy edits are concentrated in first-party data collection and international/specific-audience sections, and that product releases and regulatory actions are the primary drivers, shedding light on the status quo and the evolution of LLM privacy policies.
