Table of Contents
Fetching ...

Improving Phishing Resilience with AI-Generated Training: Evidence on Prompting, Personalization, and Duration

Francesco Greco, Giuseppe Desolda, Cesare Tucci, Andrea Esposito, Antonio Curci, Antonio Piccinno

TL;DR

The paper investigates whether large language models can autonomously generate scalable phishing-awareness training and how prompting style, personalization, and training duration affect learning. Across two controlled studies (N=80 and N=400), AI-generated content produced robust pre-post improvements in phishing detection, with simple direct-profile prompts performing best descriptively. Personalization via psychometric profiling did not yield measurable performance gains, while longer training offered modest accuracy benefits. The results support deploying LLM-based training at scale and highlight design emphasis on content depth over static personalization, with implications for rapid experimentation in security education.

Abstract

Phishing remains a persistent cybersecurity threat; however, developing scalable and effective user training is labor-intensive and challenging to maintain. Generative Artificial Intelligence offers an interesting opportunity, but empirical evidence on its instructional efficacy remains scarce. This paper provides an experimental validation of Large Language Models (LLMs) as autonomous engines for generating phishing resilience training. Across two controlled studies (N=480), we demonstrate that AI-generated content yields significant pre-post learning gains regardless of the specific prompting strategy employed. Study 1 (N=80) compares four prompting techniques, finding that even a straightforward "direct-profile" strategy--simply embedding user traits into the prompt--produces effective training material. Study 2 (N=400) investigates the scalability of this approach by testing personalization and training duration. Results show that complex psychometric personalization offers no measurable advantage over well-designed generic content, while longer training duration provides a modest boost in accuracy. These findings suggest that organizations can leverage LLMs to generate high-quality, effective training at scale without the need for complex user profiling, relying instead on the inherent capabilities of the model.

Improving Phishing Resilience with AI-Generated Training: Evidence on Prompting, Personalization, and Duration

TL;DR

The paper investigates whether large language models can autonomously generate scalable phishing-awareness training and how prompting style, personalization, and training duration affect learning. Across two controlled studies (N=80 and N=400), AI-generated content produced robust pre-post improvements in phishing detection, with simple direct-profile prompts performing best descriptively. Personalization via psychometric profiling did not yield measurable performance gains, while longer training offered modest accuracy benefits. The results support deploying LLM-based training at scale and highlight design emphasis on content depth over static personalization, with implications for rapid experimentation in security education.

Abstract

Phishing remains a persistent cybersecurity threat; however, developing scalable and effective user training is labor-intensive and challenging to maintain. Generative Artificial Intelligence offers an interesting opportunity, but empirical evidence on its instructional efficacy remains scarce. This paper provides an experimental validation of Large Language Models (LLMs) as autonomous engines for generating phishing resilience training. Across two controlled studies (N=480), we demonstrate that AI-generated content yields significant pre-post learning gains regardless of the specific prompting strategy employed. Study 1 (N=80) compares four prompting techniques, finding that even a straightforward "direct-profile" strategy--simply embedding user traits into the prompt--produces effective training material. Study 2 (N=400) investigates the scalability of this approach by testing personalization and training duration. Results show that complex psychometric personalization offers no measurable advantage over well-designed generic content, while longer training duration provides a modest boost in accuracy. These findings suggest that organizations can leverage LLMs to generate high-quality, effective training at scale without the need for complex user profiling, relying instead on the inherent capabilities of the model.

Paper Structure

This paper contains 49 sections, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Example of generated training "introduction" module.
  • Figure 2: Example of generated training "exercises" module.
  • Figure 3: Complete structure of the baseline prompt including context, format, and style constraints.
  • Figure 4: Examples of email samples used in the study: (a) legitimate email and (b) phishing email.
  • Figure 5: Study procedure for the evaluation of LLM-generated trainings study
  • ...and 6 more figures