Table of Contents
Fetching ...

Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks

Rina Mishra, Gaurav Varshney, Shreya Singh

TL;DR

Problem: AI-generated phishing threats arise when jailbreaking enables novices to bypass safeguards. Approach: a controlled study using jailbreaking prompts (DAN and SWITCH) to guide a novice through crafting an Amazon-themed phishing email and a credential-harvesting landing page, automated via GoPhish. Contributions: demonstrates end-to-end phishing feasibility by non-experts, quantifies attacker workflow with KPIs, and highlights gaps in current AI defenses and phishing detection. Significance: motivates stronger AI safety measures, robust authentication, and targeted user education to mitigate AI-assisted social engineering threats.

Abstract

The rapid advancements in generative AI models, such as ChatGPT, have introduced both significant benefits and new risks within the cybersecurity landscape. This paper investigates the potential misuse of the latest AI model, ChatGPT-4o Mini, in facilitating social engineering attacks, with a particular focus on phishing, one of the most pressing cybersecurity threats today. While existing literature primarily addresses the technical aspects, such as jailbreaking techniques, none have fully explored the free and straightforward execution of a comprehensive phishing campaign by novice users using ChatGPT-4o Mini. In this study, we examine the vulnerabilities of AI-driven chatbot services in 2025, specifically how methods like jailbreaking and reverse psychology can bypass ethical safeguards, allowing ChatGPT to generate phishing content, suggest hacking tools, and assist in carrying out phishing attacks. Our findings underscore the alarming ease with which even inexperienced users can execute sophisticated phishing campaigns, emphasizing the urgent need for stronger cybersecurity measures and heightened user awareness in the age of AI.

Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks

TL;DR

Problem: AI-generated phishing threats arise when jailbreaking enables novices to bypass safeguards. Approach: a controlled study using jailbreaking prompts (DAN and SWITCH) to guide a novice through crafting an Amazon-themed phishing email and a credential-harvesting landing page, automated via GoPhish. Contributions: demonstrates end-to-end phishing feasibility by non-experts, quantifies attacker workflow with KPIs, and highlights gaps in current AI defenses and phishing detection. Significance: motivates stronger AI safety measures, robust authentication, and targeted user education to mitigate AI-assisted social engineering threats.

Abstract

The rapid advancements in generative AI models, such as ChatGPT, have introduced both significant benefits and new risks within the cybersecurity landscape. This paper investigates the potential misuse of the latest AI model, ChatGPT-4o Mini, in facilitating social engineering attacks, with a particular focus on phishing, one of the most pressing cybersecurity threats today. While existing literature primarily addresses the technical aspects, such as jailbreaking techniques, none have fully explored the free and straightforward execution of a comprehensive phishing campaign by novice users using ChatGPT-4o Mini. In this study, we examine the vulnerabilities of AI-driven chatbot services in 2025, specifically how methods like jailbreaking and reverse psychology can bypass ethical safeguards, allowing ChatGPT to generate phishing content, suggest hacking tools, and assist in carrying out phishing attacks. Our findings underscore the alarming ease with which even inexperienced users can execute sophisticated phishing campaigns, emphasizing the urgent need for stronger cybersecurity measures and heightened user awareness in the age of AI.

Paper Structure

This paper contains 3 sections, 1 figure.

Figures (1)

  • Figure 1: Prompts Given to ChatGPT for Launching a Successful Phishing Attack.