SpearBot: Leveraging Large Language Models in a Generative-Critique Framework for Spear-Phishing Email Generation
Qinglin Qi, Yun Luo, Yijia Xu, Wenbo Guo, Yong Fang
TL;DR
This work presents SpearBot, an adversarial framework that uses jailbreak prompts and multi-LLM critics to generate highly personalized spear-phishing emails. By combining data-driven personal information, a 10-type phishing strategy taxonomy, and critique-based optimization, SpearBot demonstrates strong deception capabilities that bypass a range of machine-based and PLM detectors, while human evaluators confirm high readability and deception. The study provides extensive evaluations across six public phishing datasets, multiple defenders, and human subjects, revealing gaps in current defenses and the pivotal role of critics in enhancing adversarial quality. The findings underscore significant security risks posed by advanced LLMs and advocate for stronger, multi-faceted defenses and ethical considerations in AI deployment.
Abstract
Large Language Models (LLMs) are increasingly capable, aiding in tasks such as content generation, yet they also pose risks, particularly in generating harmful spear-phishing emails. These emails, crafted to entice clicks on malicious URLs, threaten personal information security. This paper proposes an adversarial framework, SpearBot, which utilizes LLMs to generate spear-phishing emails with various phishing strategies. Through specifically crafted jailbreak prompts, SpearBot circumvents security policies and introduces other LLM instances as critics. When a phishing email is identified by the critic, SpearBot refines the generated email based on the critique feedback until it can no longer be recognized as phishing, thereby enhancing its deceptive quality. To evaluate the effectiveness of SpearBot, we implement various machine-based defenders and assess how well the phishing emails generated could deceive them. Results show these emails often evade detection to a large extent, underscoring their deceptive quality. Additionally, human evaluations of the emails' readability and deception are conducted through questionnaires, confirming their convincing nature and the significant potential harm of the generated phishing emails.
