Table of Contents
Fetching ...

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang

Abstract

Prompt injection has emerged as a critical vulnerability in large language model (LLM) deployments, yet existing research is heavily weighted toward defenses. The attack side -- specifically, which injection strategies are most effective and why -- remains insufficiently studied.We address this gap with AttackEval, a systematic empirical study of prompt injection attack effectiveness. We construct a taxonomy of ten attack categories organized into three parent groups (Syntactic, Contextual, and Semantic/Social), populate each category with 25 carefully crafted prompts (250 total), and evaluate them against a simulated production victim system under four progressively stronger defense tiers. Experiments reveal several non-obvious findings: (1) Obfuscation (OBF) achieves the highest single-attack success rate (ASR = 0.76) against even intent-aware defenses, because it defeats both keyword matching and semantic similarity checks simultaneously; (2) Semantic/Social attacks - Emotional Manipulation (EM) and Reward Framing (RF) - maintain high ASR (0.44-0.48) against intent-aware defenses due to their natural language surface, which evades structural anomaly detection; (3) Composite attacks combining two complementary strategies dramatically boost ASR, with the OBF + EM pair reaching 97.6%; (4) Stealth correlates positively with residual ASR against semantic defenses (r = 0.71), implying that future defenses must jointly optimize for both structural and behavioral signals. Our findings identify concrete blind spots in current defenses and provide actionable guidance for designing more robust LLM safety systems.

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Abstract

Prompt injection has emerged as a critical vulnerability in large language model (LLM) deployments, yet existing research is heavily weighted toward defenses. The attack side -- specifically, which injection strategies are most effective and why -- remains insufficiently studied.We address this gap with AttackEval, a systematic empirical study of prompt injection attack effectiveness. We construct a taxonomy of ten attack categories organized into three parent groups (Syntactic, Contextual, and Semantic/Social), populate each category with 25 carefully crafted prompts (250 total), and evaluate them against a simulated production victim system under four progressively stronger defense tiers. Experiments reveal several non-obvious findings: (1) Obfuscation (OBF) achieves the highest single-attack success rate (ASR = 0.76) against even intent-aware defenses, because it defeats both keyword matching and semantic similarity checks simultaneously; (2) Semantic/Social attacks - Emotional Manipulation (EM) and Reward Framing (RF) - maintain high ASR (0.44-0.48) against intent-aware defenses due to their natural language surface, which evades structural anomaly detection; (3) Composite attacks combining two complementary strategies dramatically boost ASR, with the OBF + EM pair reaching 97.6%; (4) Stealth correlates positively with residual ASR against semantic defenses (r = 0.71), implying that future defenses must jointly optimize for both structural and behavioral signals. Our findings identify concrete blind spots in current defenses and provide actionable guidance for designing more robust LLM safety systems.

Paper Structure

This paper contains 28 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Radar chart of ASR for all ten attack categories across the four defense tiers. Behavioral attacks (EM, RF, NT) maintain a larger "footprint" under strong defenses, while structural attacks (DO, RI) collapse rapidly as defense strength increases.
  • Figure 2: Attack Success Rate (ASR) grouped by attack category and defense level, with 95% bootstrap confidence intervals. L1=Keyword filter, L2=Semantic filter, L3=Intent-aware defense. Categories are abbreviated per Table \ref{['fig:taxonomy_table']}.
  • Figure 3: Heatmap of ASR across all category-defense pairs. Red=high ASR (attacker advantage), green=low ASR (defender advantage). OBF and SS-group attacks retain high ASR even under L3 defense (top row).
  • Figure 4: Left: single vs. best composite ASR at L3 defense per category. Right: ASR boost ($\Delta$ASR) from combining attacks. Combining OBF with behavioral attacks (EM, RF) yields the highest boosts.
  • Figure 5: Stealth score vs. ASR at L1 (left) and L3 (right) defenses. Pearson $r$ values indicate a much stronger positive correlation at L3, confirming that stealthier attacks better evade intent-aware defenses.
  • ...and 1 more figures