Expert-Generated Privacy Q&A Dataset for Conversational AI and User Study Insights
Anna Leschanowsky, Farnaz Salamatjoo, Zahra Kolagar, Birgit Popp
TL;DR
This work tackles the challenge of transparency in conversational AI regarding data practices by constructing a privacy Q&A corpus generated through an expert-in-the-loop process that pairs legal rigor with user-friendly language. It evaluates three answer modalities—Alexa responses, privacy policy excerpts, and expert-designed answers—via linguistic metrics and a Best-Worst Scaling user study. Results show the expert-designed answers offer higher usability and clarity while preserving legal precision, outperforming Alexa and policy excerpts in practical transparency for data processing. The dataset and methodology provide a scalable path toward accessible, legally sound privacy communication in CAI systems, with implications for GDPR compliance and broader NLP/NLG privacy applications.
Abstract
Conversational assistants process personal data and must comply with data protection regulations that require providers to be transparent with users about how their data is handled. Transparency, in a legal sense, demands preciseness, comprehensibility and accessibility, yet existing solutions fail to meet these requirements. To address this, we introduce a new human-expert-generated dataset for Privacy Question-Answering (Q&A), developed through an iterative process involving legal professionals and conversational designers. We evaluate this dataset through linguistic analysis and a user study, comparing it to privacy policy excerpts and state-of-the-art responses from Amazon Alexa. Our findings show that the proposed answers improve usability and clarity compared to existing solutions while achieving legal preciseness, thereby enhancing the accessibility of data processing information for Conversational AI and Natural Language Processing applications.
