Clio: Privacy-Preserving Insights into Real-World AI Use

Alex Tamkin; Miles McCain; Kunal Handa; Esin Durmus; Liane Lovitt; Ankur Rathi; Saffron Huang; Alfred Mountfield; Jerry Hong; Stuart Ritchie; Michael Stern; Brian Clarke; Landon Goldberg; Theodore R. Sumers; Jared Mueller; William McEachen; Wes Mitchell; Shan Carter; Jack Clark; Jared Kaplan; Deep Ganguli

Clio: Privacy-Preserving Insights into Real-World AI Use

Alex Tamkin, Miles McCain, Kunal Handa, Esin Durmus, Liane Lovitt, Ankur Rathi, Saffron Huang, Alfred Mountfield, Jerry Hong, Stuart Ritchie, Michael Stern, Brian Clarke, Landon Goldberg, Theodore R. Sumers, Jared Mueller, William McEachen, Wes Mitchell, Shan Carter, Jack Clark, Jared Kaplan, Deep Ganguli

TL;DR

Clio addresses the lack of public data on real-world AI usage due to privacy and scalability concerns. It uses a privacy-preserving pipeline where AI assistants themselves surface aggregated usage patterns from millions of conversations, without exposing private data. The paper demonstrates Clio’s ability to reveal dominant use cases, multilingual variation, and coordinated abuse, and shows how these insights can strengthen safety classifiers and monitoring during high-stakes events. It also discusses limitations, ethical considerations, and governance implications of empirical AI usage analysis. Overall, Clio provides a scalable method for empirical AI governance and safety research with a strong privacy focus.

Abstract

How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users' data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregated usage patterns across millions of conversations, without the need for human reviewers to read raw conversations. We validate this can be done with a high degree of accuracy and privacy by conducting extensive evaluations. We demonstrate Clio's usefulness in two broad ways. First, we share insights about how models are being used in the real world from one million Claude.ai Free and Pro conversations, ranging from providing advice on hairstyles to providing guidance on Git operations and concepts. We also identify the most common high-level use cases on Claude.ai (coding, writing, and research tasks) as well as patterns that differ across languages (e.g., conversations in Japanese discuss elder care and aging populations at higher-than-typical rates). Second, we use Clio to make our systems safer by identifying coordinated attempts to abuse our systems, monitoring for unknown unknowns during critical periods like launches of new capabilities or major world events, and improving our existing monitoring systems. We also discuss the limitations of our approach, as well as risks and ethical concerns. By enabling analysis of real-world AI usage, Clio provides a scalable platform for empirically grounded AI safety and governance.

Clio: Privacy-Preserving Insights into Real-World AI Use

TL;DR

Abstract

Clio: Privacy-Preserving Insights into Real-World AI Use

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)