From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
TL;DR
This work introduces two automated, LLM-driven attack pipelines—Dynamic Deceptor (DyDec) and Static Deceptor (StaDec)—to generate dynamic and adaptive adversarial text that preserves semantics while deceiving zero-shot LLM classifiers. The authors demonstrate strong transferability across unseen models and provide a comprehensive evaluation on four sensitive tasks using GPT-4o and Llama-3-70B, comparing against prompt-injection and related attacks. They also assess three defense modalities, finding that perplexity and naive LLM defenses have limited effectiveness, while paraphrasing offers partial mitigation. The study highlights significant robustness gaps in current LLMs and argues for stronger safety alignment and adversarial training to improve resilience in real-world deployments.
Abstract
LLMs can provide substantial zero-shot performance on diverse tasks using a simple task prompt, eliminating the need for training or fine-tuning. However, when applying these models to sensitive tasks, it is crucial to thoroughly assess their robustness against adversarial inputs. In this work, we introduce Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack frameworks designed to systematically generate dynamic and adaptive adversarial examples by leveraging the understanding of the LLMs. We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text while effectively deceiving the target LLM. By utilizing an automated, LLM-driven pipeline, we eliminate the dependence on external heuristics. Our attacks evolve with the advancements in LLMs and demonstrate strong transferability across models unknown to the attacker. Overall, this work provides a systematic approach for the self-assessment of an LLM's robustness. We release our code and data at https://github.com/Shukti042/AdversarialExample.
