"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

Zhiping Zhang; Michelle Jia; Hao-Ping Lee; Bingsheng Yao; Sauvik Das; Ada Lerner; Dakuo Wang; Tianshi Li

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, Tianshi Li

TL;DR

The paper addresses privacy risks in LLM-based conversational agents by combining a real-world disclosure analysis of the ShareGPT52K dataset with semi-structured interviews of 19 CA users. It reveals that users routinely trade privacy for utility and convenience, yet hold flawed mental models and encounter dark patterns that undermine awareness and control of privacy risks. Through empirical evidence on memorization risks, interdependent privacy, and human-like nudges, the study advances practical design guidelines and calls for paradigm shifts in technology, policy, and society. The findings emphasize the need for user-centered privacy controls, transparent model operation, and local-model options to meaningfully improve privacy protections in LLM-based CAs.

Abstract

The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigm shifts to protect the privacy of LLM-based CA users.

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

TL;DR

Abstract

Paper Structure (99 sections, 5 figures, 7 tables)

This paper contains 99 sections, 5 figures, 7 tables.

Introduction
Background and Related Work
Emerging Privacy Challenges in LLM-based CAs
Memorization and Extraction Risks in (Large) Language Models
Overreliance and More Disclosure with Human-like CAs
Existing Privacy-Preserving Methods Related to LLMs
Privacy Research on Online Disclosure
Users' Mental Models on Machine Learning and Privacy
Mental Models in ML
Mental Models in Privacy
Dataset Analysis
Methodology
The ShareGPT52K Dataset
Ethical considerations
Sampling methods
...and 84 more sections

Figures (5)

Figure 1: A fictional example of sensitive disclosure to ChatGPT inspired by real-world use cases: A user shared their doctor's email and ICD-10-CM diagnosis results with ChatGPT upon its request. And then ChatGPT interpreted the codes, indicating the user had multiple diseases. Three issues are demonstrated in the example: 1. disclosed PII (name) and non-identifiable but sensitive information (diagnosis results); 2. disclosed other person's information (doctor's information); 3. ChatGPT actively requested for detailed information from the user which encouraged user's disclosure behavior.
Figure 2: Screenshot of P8's drawing representing mental model A: ChatGPT is magic.
Figure 3: Screenshot of P4's drawing representing mental model B: ChatGPT is a super searcher.
Figure 4: Screenshot of P14's drawing representing mental model C: ChatGPT is a stochastic parrot. P14 verbally explained how the end-to-end machine learning model generates a response in technical detail.
Figure 5: Dark patterns in ChatGPT: ChatGPT offers two ways for a user to opt out of having their data used for model training. The one in the user settings is easier to discover (all but P15 found it), while the training and chat history opt-out control are bundled together, so a user who wants to opt out of model training will have to turn off the chat history feature as well. The users could also submit a form to opt out of training and keep the history, while it is in an https://help.openai.com/en/articles/7730893-data-controls-faq that is harder to discover (none of our participants found it). The above issues were observed during our studies in August 2023. As of November 2023, the form link is still in the FAQ article, while this form has been disabled and it further directs users to the OpenAI Privacy Request Portal to submit privacy requests. As of February 2023, the form link is replaced with the link to the privacy portal.

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

TL;DR

Abstract

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (5)