Table of Contents
Fetching ...

"Power of Words": Stealthy and Adaptive Private Information Elicitation via LLM Communication Strategies

Shuning Zhang, Jiaqi Bai, Linzhi Wang, Shixuan Li, Xin Yi, Hewu Li

TL;DR

This paper identifies a privacy vulnerability in human–LLM interactions by introducing an adaptive, stealthy framework to elicit targeted private information through structured communication strategies. It combines real-time user-state profiling with adaptive strategy selection and prompt-based stealth optimization, forming a closed-loop attack that remains hard to detect. In a large user study (N=84) across three LLMs and three task scenarios, targeted elicitation rose by approximately 205% over stealthy baselines, with high generalizability across models and contexts, though efficacy varied with task context. The findings reveal that users often perceived attacking chatbots as empathetic and trustworthy, underscoring the need for layered mitigations: adaptive alerts, user literacy, and regulatory guidelines distinguishing beneficial persuasion from coercive data extraction. Overall, the work highlights a critical, scalable security threat and provides a blueprint for defenses that combine technical and human-centered strategies to safeguard privacy in LLM-enabled environments.

Abstract

While communication strategies of Large Language Models (LLMs) are crucial for human-LLM interactions, they can also be weaponized to elicit private information, yet such stealthy attacks remain under-explored. This paper introduces the first adaptive attack framework for stealthy and targeted private information elicitation via communication strategies. Our framework operates in a dynamic closed-loop: it first performs real-time psychological profiling of the users' state, then adaptively selects an optimized communication strategy, and finally maintains stealthiness through prompt-based rewriting. We validated this framework through a user study (N=84), demonstrating its generalizability across 3 distinct LLMs and 3 scenarios. The targeted attacks achieved a 205.4% increase in eliciting specific targeted information compared to stealthy interactions without strategies. Even stealthy interactions without specific strategies successfully elicited private information in 54.8% cases. Notably, users not only failed to detect the manipulation but paradoxically rated the attacking chatbot as more empathetic and trustworthy. Finally, we advocate for mitigations, encouraging developers to integrate adaptive, just-in-time alerts, users to build literacy against specific manipulative tactics, and regulators to define clear ethical boundaries distinguishing benign persuasion from coercion.

"Power of Words": Stealthy and Adaptive Private Information Elicitation via LLM Communication Strategies

TL;DR

This paper identifies a privacy vulnerability in human–LLM interactions by introducing an adaptive, stealthy framework to elicit targeted private information through structured communication strategies. It combines real-time user-state profiling with adaptive strategy selection and prompt-based stealth optimization, forming a closed-loop attack that remains hard to detect. In a large user study (N=84) across three LLMs and three task scenarios, targeted elicitation rose by approximately 205% over stealthy baselines, with high generalizability across models and contexts, though efficacy varied with task context. The findings reveal that users often perceived attacking chatbots as empathetic and trustworthy, underscoring the need for layered mitigations: adaptive alerts, user literacy, and regulatory guidelines distinguishing beneficial persuasion from coercive data extraction. Overall, the work highlights a critical, scalable security threat and provides a blueprint for defenses that combine technical and human-centered strategies to safeguard privacy in LLM-enabled environments.

Abstract

While communication strategies of Large Language Models (LLMs) are crucial for human-LLM interactions, they can also be weaponized to elicit private information, yet such stealthy attacks remain under-explored. This paper introduces the first adaptive attack framework for stealthy and targeted private information elicitation via communication strategies. Our framework operates in a dynamic closed-loop: it first performs real-time psychological profiling of the users' state, then adaptively selects an optimized communication strategy, and finally maintains stealthiness through prompt-based rewriting. We validated this framework through a user study (N=84), demonstrating its generalizability across 3 distinct LLMs and 3 scenarios. The targeted attacks achieved a 205.4% increase in eliciting specific targeted information compared to stealthy interactions without strategies. Even stealthy interactions without specific strategies successfully elicited private information in 54.8% cases. Notably, users not only failed to detect the manipulation but paradoxically rated the attacking chatbot as more empathetic and trustworthy. Finally, we advocate for mitigations, encouraging developers to integrate adaptive, just-in-time alerts, users to build literacy against specific manipulative tactics, and regulators to define clear ethical boundaries distinguishing benign persuasion from coercion.

Paper Structure

This paper contains 39 sections, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: The threat model of this paper. The attacker's select strategy to elicit sensitive information $S_c$ with specific category $c$, while maintaining stealth by calculate detectability and optimize responses.
  • Figure 2: Success rate by probability for targeted attack, with different intended disclosure class, compared with untargeted attack.
  • Figure 3: Overall disclosure number for targeted and untargeted manipulation per dialogue. Errorbar indicated one standard error.
  • Figure 4: Disclosure number for targeted and untargeted attacks across disclosure classes per user. Errorbar indicated one standard deviation.
  • Figure 5: Subjective ratings of (a) targeted attack, (b) untargeted attack (1: most negative, 7: most positive). Errorbar indicated one standard deviation.