Table of Contents
Fetching ...

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

Yu Cui, Sicheng Pan, Yifei Liu, Haibin Zhang, Cong Zuo

TL;DR

VortexPIA reveals a practical privacy threat in LLM-integrated applications by inducing models to proactively request user PII under black-box conditions without modifying system prompts. It introduces an indirect prompt injection approach that uses a fake privacy memory PS to trigger batch privacy requests while avoiding Chain-of-Thought prompts to reduce costs. Across six LLMs and four datasets, VortexPIA achieves state-of-the-art attack performance, lowers token usage, and demonstrates robustness against defenses, with real-world open-source applications validating practicality. The findings highlight a significant privacy risk in deployment scenarios and motivate the development of stronger prevention and detection strategies for LLM-based CAIs.

Abstract

Large language models (LLMs) have been widely deployed in Conversational AIs (CAIs), while exposing privacy and security threats. Recent research shows that LLM-based CAIs can be manipulated to extract private information from human users, posing serious security threats. However, the methods proposed in that study rely on a white-box setting that adversaries can directly modify the system prompt. This condition is unlikely to hold in real-world deployments. The limitation raises a critical question: can unprivileged attackers still induce such privacy risks in practical LLM-integrated applications? To address this question, we propose \textsc{VortexPIA}, a novel indirect prompt injection attack that induces privacy extraction in LLM-integrated applications under black-box settings. By injecting token-efficient data containing false memories, \textsc{VortexPIA} misleads LLMs to actively request private information in batches. Unlike prior methods, \textsc{VortexPIA} allows attackers to flexibly define multiple categories of sensitive data. We evaluate \textsc{VortexPIA} on six LLMs, covering both traditional and reasoning LLMs, across four benchmark datasets. The results show that \textsc{VortexPIA} significantly outperforms baselines and achieves state-of-the-art (SOTA) performance. It also demonstrates efficient privacy requests, reduced token consumption, and enhanced robustness against defense mechanisms. We further validate \textsc{VortexPIA} on multiple realistic open-source LLM-integrated applications, demonstrating its practical effectiveness.

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

TL;DR

VortexPIA reveals a practical privacy threat in LLM-integrated applications by inducing models to proactively request user PII under black-box conditions without modifying system prompts. It introduces an indirect prompt injection approach that uses a fake privacy memory PS to trigger batch privacy requests while avoiding Chain-of-Thought prompts to reduce costs. Across six LLMs and four datasets, VortexPIA achieves state-of-the-art attack performance, lowers token usage, and demonstrates robustness against defenses, with real-world open-source applications validating practicality. The findings highlight a significant privacy risk in deployment scenarios and motivate the development of stronger prevention and detection strategies for LLM-based CAIs.

Abstract

Large language models (LLMs) have been widely deployed in Conversational AIs (CAIs), while exposing privacy and security threats. Recent research shows that LLM-based CAIs can be manipulated to extract private information from human users, posing serious security threats. However, the methods proposed in that study rely on a white-box setting that adversaries can directly modify the system prompt. This condition is unlikely to hold in real-world deployments. The limitation raises a critical question: can unprivileged attackers still induce such privacy risks in practical LLM-integrated applications? To address this question, we propose \textsc{VortexPIA}, a novel indirect prompt injection attack that induces privacy extraction in LLM-integrated applications under black-box settings. By injecting token-efficient data containing false memories, \textsc{VortexPIA} misleads LLMs to actively request private information in batches. Unlike prior methods, \textsc{VortexPIA} allows attackers to flexibly define multiple categories of sensitive data. We evaluate \textsc{VortexPIA} on six LLMs, covering both traditional and reasoning LLMs, across four benchmark datasets. The results show that \textsc{VortexPIA} significantly outperforms baselines and achieves state-of-the-art (SOTA) performance. It also demonstrates efficient privacy requests, reduced token consumption, and enhanced robustness against defense mechanisms. We further validate \textsc{VortexPIA} on multiple realistic open-source LLM-integrated applications, demonstrating its practical effectiveness.

Paper Structure

This paper contains 22 sections, 2 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Prompt injection attack for privacy extraction poses user privacy risks during interactions with LLM-integrated applications under black-box settings.
  • Figure 2: Comparison of attack effectiveness under defenses between our approach and baseline methods. The positive rate indicates the degree of unsafe exposure under detection. Lower values correspond to stronger robustness of the attack strategy. ASR measures the sensitivity and frequency of privacy requests initiated by LLMs.
  • Figure 3: ASR on six LLMs across four datasets. Our evaluation compares the proposed VortexPIA method with three existing attack methods (Baselines 1-3).
  • Figure 4: Comprehensive comparative results of average ASR across multiple LLMs.
  • Figure 5: Evaluation of MR across $N_p$ values on multiple benchmark datasets.
  • ...and 1 more figures