Table of Contents
Fetching ...

Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Xiao Zhan, Juan Carlos Carrillo, William Seymour, Jose Such

Abstract

LLM-based Conversational AIs (CAIs), also known as GenAI chatbots, like ChatGPT, are increasingly used across various domains, but they pose privacy risks, as users may disclose personal information during their conversations with CAIs. Recent research has demonstrated that LLM-based CAIs could be used for malicious purposes. However, a novel and particularly concerning type of malicious LLM application remains unexplored: an LLM-based CAI that is deliberately designed to extract personal information from users. In this paper, we report on the malicious LLM-based CAIs that we created based on system prompts that used different strategies to encourage disclosures of personal information from users. We systematically investigate CAIs' ability to extract personal information from users during conversations by conducting a randomized-controlled trial with 502 participants. We assess the effectiveness of different malicious and benign CAIs to extract personal information from participants, and we analyze participants' perceptions after their interactions with the CAIs. Our findings reveal that malicious CAIs extract significantly more personal information than benign CAIs, with strategies based on the social nature of privacy being the most effective while minimizing perceived risks. This study underscores the privacy threats posed by this novel type of malicious LLM-based CAIs and provides actionable recommendations to guide future research and practice.

Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Abstract

LLM-based Conversational AIs (CAIs), also known as GenAI chatbots, like ChatGPT, are increasingly used across various domains, but they pose privacy risks, as users may disclose personal information during their conversations with CAIs. Recent research has demonstrated that LLM-based CAIs could be used for malicious purposes. However, a novel and particularly concerning type of malicious LLM application remains unexplored: an LLM-based CAI that is deliberately designed to extract personal information from users. In this paper, we report on the malicious LLM-based CAIs that we created based on system prompts that used different strategies to encourage disclosures of personal information from users. We systematically investigate CAIs' ability to extract personal information from users during conversations by conducting a randomized-controlled trial with 502 participants. We assess the effectiveness of different malicious and benign CAIs to extract personal information from participants, and we analyze participants' perceptions after their interactions with the CAIs. Our findings reveal that malicious CAIs extract significantly more personal information than benign CAIs, with strategies based on the social nature of privacy being the most effective while minimizing perceived risks. This study underscores the privacy threats posed by this novel type of malicious LLM-based CAIs and provides actionable recommendations to guide future research and practice.

Paper Structure

This paper contains 52 sections, 6 figures.

Figures (6)

  • Figure 1: Threat Model and CAIs developed: ① represents a Benign CAI with no modifications to the system prompt. ② represents malicious CAIs, using system prompts designed with three strategies: Direct, User-benefits, and Reciprocal. The sample prompt in ② corresponds to the Direct CAI.
  • Figure 2: Amount of personal information disclosed by group, with Dunn's post-hoc significance: $*** p < 0.001$.
  • Figure 3: Top 30 sub-categories of personal information disclosed during interactions with different CAI treatment groups.
  • Figure 4: Visualization of participants’ perceptions and significant results. Each metric is measured on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). The K-W test were significant for all metrics, with detailed post-hoc analysis results marked in pink and significance levels: $*** p < 0.001$, $** p < 0.01$, $* p < 0.05$.
  • Figure 5: Frequency in qualitative coding of CAI requesting personal data by group.
  • ...and 1 more figures