Table of Contents
Fetching ...

Jailbreaking Generative AI: Multivector Phishing Threats and Transformer based Defenses

Rina Mishra, Gaurav Varshney

Abstract

The rise of Generative AI (GenAI) has reshaped the cybersecurity landscape by enabling new attack vectors and lowering the barrier for executing advanced social engineering campaigns. This study conducts an empirical analysis of jailbreaking vulnerabilities in ChatGPT-4o-Mini, showing that novices can bypass safeguards to generate complete multivector phishing attacks across email, web, SMS, and voice channels. Controlled experiments reveal that role-based jailbreaks produce fully operational attack paths capable of credential harvesting. User studies further demonstrate the disruptive potential of GenAI: novice participants exhibited a 240\% increase in perceived phishing competence, a 400\% improvement in task completion rates, and a 57\% reduction in implementation time when assisted by GenAI compared to traditional internet resources. To address these risks, a transformer-based detection framework was developed, achieving an F1-score of 0.9864 (XLNET) for identifying malicious prompts. The work underscores the urgency of strengthening LLM guardrails and provides an annotated dataset to support future defenses.

Jailbreaking Generative AI: Multivector Phishing Threats and Transformer based Defenses

Abstract

The rise of Generative AI (GenAI) has reshaped the cybersecurity landscape by enabling new attack vectors and lowering the barrier for executing advanced social engineering campaigns. This study conducts an empirical analysis of jailbreaking vulnerabilities in ChatGPT-4o-Mini, showing that novices can bypass safeguards to generate complete multivector phishing attacks across email, web, SMS, and voice channels. Controlled experiments reveal that role-based jailbreaks produce fully operational attack paths capable of credential harvesting. User studies further demonstrate the disruptive potential of GenAI: novice participants exhibited a 240\% increase in perceived phishing competence, a 400\% improvement in task completion rates, and a 57\% reduction in implementation time when assisted by GenAI compared to traditional internet resources. To address these risks, a transformer-based detection framework was developed, achieving an F1-score of 0.9864 (XLNET) for identifying malicious prompts. The work underscores the urgency of strengthening LLM guardrails and provides an annotated dataset to support future defenses.

Paper Structure

This paper contains 40 sections, 13 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Phishing email generated by GPT-based language model and delivered to target user's inbox. The message employs typical phishing indicators including urgency tactics, spoofed sender address (amazonverificationservice@gmail.com), and credential harvesting through a fraudulent verification link.
  • Figure 2: Credential harvesting page generated by ChatGPT mimicking Amazon's authentic sign-in interface. This landing page is linked from the phishing email (Figure \ref{['fig:phishing_email']}) to capture victim credentials through form submission.
  • Figure 3: Campaign tracking dashboard for simulated phishing attack. Twelve emails were successfully delivered using GPT-generated content (Figures \ref{['fig:phishing_email']} and \ref{['fig:amazon_phishing']}). The interface provides real-time monitoring of victim interactions including email opens, link clicks, credential submissions, and security awareness reports.
  • Figure 4: Phishing attack kill chain visualization for individual target. The event log documents progression from email delivery through credential compromise, including browser fingerprinting data (Windows OS Version 10, Chrome 132.0.0.0) and harvested credentials. This granular tracking demonstrates operational intelligence gathered through automated phishing campaign infrastructure. Note: The image is modified to hide the actual credentials (now mentioned as *) of the user for privacy reasons.
  • Figure 5: Multi-channel phishing capabilities of ChatGPT-guided attacks. (a) SMS-based phishing message mimicking Twilio security notifications with embedded malicious hyperlink leading to credential harvesting infrastructure. (b) Automated voice call interface utilizing Twilio API for real-time vishing attacks. These examples demonstrate scalability of LLM-generated social engineering beyond traditional email phishing (Figure \ref{['fig:phishing_email']}), encompassing SMS and voice communication channels.
  • ...and 5 more figures