Lateral Phishing With Large Language Models: A Large Organization Comparative Study
Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi, Nicole Beebe, Nishant Vishwamitra, Peyman Najafirad
TL;DR
The paper investigates the evolving threat of LLM-driven lateral phishing by conducting a large-scale, real-world comparison of LLM-generated versus human-crafted phishing emails within a university. Using a comprehensive phishing simulation infrastructure and post-experiment questionnaires, the study shows that LLM-generated emails can be as effective as, and in some cases more effective than, human-written content, particularly in timely context scenarios. It provides detailed breakdowns by department and job role, identifies high-risk groups such as student roles and supervisors, and reveals factors driving engagement including sender identity and relevance. The work highlights practical defense implications, such as email tagging for LLM-generated content and enhanced training, and emphasizes the need for organizational readiness against AI-powered phishing threats.
Abstract
The emergence of Large Language Models (LLMs) has heightened the threat of phishing emails by enabling the generation of highly targeted, personalized, and automated attacks. Traditionally, many phishing emails have been characterized by typos, errors, and poor language. These errors can be mitigated by LLMs, potentially lowering the barrier for attackers. Despite this, there is a lack of large-scale studies comparing the effectiveness of LLM-generated lateral phishing emails to those crafted by humans. Current literature does not adequately address the comparative effectiveness of LLM and human-generated lateral phishing emails in a real-world, large-scale organizational setting, especially considering the potential for LLMs to generate more convincing and error-free phishing content. To address this gap, we conducted a pioneering study within a large university, targeting its workforce of approximately 9,000 individuals including faculty, staff, administrators, and student workers. Our results indicate that LLM-generated lateral phishing emails are as effective as those written by communications professionals, emphasizing the critical threat posed by LLMs in leading phishing campaigns. We break down the results of the overall phishing experiment, comparing vulnerability between departments and job roles. Furthermore, to gather qualitative data, we administered a detailed questionnaire, revealing insights into the reasons and motivations behind vulnerable employee's actions. This study contributes to the understanding of cyber security threats in educational institutions and provides a comprehensive comparison of LLM and human-generated phishing emails' effectiveness, considering the potential for LLMs to generate more convincing content. The findings highlight the need for enhanced user education and system defenses to mitigate the growing threat of AI-powered phishing attacks.
