Table of Contents
Fetching ...

Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

Andrew T. Rozema, James C. Davis

TL;DR

This large-scale real-world study (N=12,511) investigates the effectiveness of anti-phishing training and finds no significant main effects on click or report rates from either lecture-based or interactive modalities. The NIST Phish Scale robustly predicted user behavior, with click rates rising from $7.0\%$ for Easy to $15.0\%$ for Hard lures ($F(2,12086)=41.415$, $p<0.001$, $\eta^2=0.007$), validating this standardized measure. A novel Organizational Inoculation Index reveals substantial organizational resilience (36–55% inoculation patterns) that does not depend on individual training efficacy, suggesting collective security behaviors and feedback loops contribute to threat mitigation. The findings argue for a layered defense strategy that combines technical controls with awareness training, and they establish a methodological framework for future work, including refining the Organizational Inoculation Index and accounting for AI-crafted phishing threats. The work provides policy-relevant evidence that training alone yields limited risk reduction and that real-world cybersecurity requires integrated, adaptive defenses.

Abstract

Social engineering attacks delivered via email, commonly known as phishing, represent a persistent cybersecurity threat leading to significant organizational incidents and data breaches. Although many organizations train employees on phishing, often mandated by compliance requirements, the real-world effectiveness of this training remains debated. To contribute to evidence-based cybersecurity policy, we conducted a large-scale reproduction study (N = 12,511) at a US-based financial technology firm. Our experimental design refined prior work by comparing training modalities in operational environments, validating NIST's standardized phishing difficulty measurement, and introducing novel organizational-level temporal resilience metrics. Echoing prior work, training interventions showed no significant main effects on click rates (p=0.450) or reporting rates (p=0.417), with negligible effect sizes. However, we found that the NIST Phish Scale predicted user behavior, with click rates increasing from 7.0% for easy lures to 15.0% for hard lures. Our organizational-level resilience result was mixed: 36-55% of campaigns achieved "inoculation" patterns where reports preceded clicks, but training did not significantly improve organizational-level temporal protection. In summary, our results confirm the ineffectiveness of current phishing training approaches while offering a refined study design for future work.

Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

TL;DR

This large-scale real-world study (N=12,511) investigates the effectiveness of anti-phishing training and finds no significant main effects on click or report rates from either lecture-based or interactive modalities. The NIST Phish Scale robustly predicted user behavior, with click rates rising from for Easy to for Hard lures (, , ), validating this standardized measure. A novel Organizational Inoculation Index reveals substantial organizational resilience (36–55% inoculation patterns) that does not depend on individual training efficacy, suggesting collective security behaviors and feedback loops contribute to threat mitigation. The findings argue for a layered defense strategy that combines technical controls with awareness training, and they establish a methodological framework for future work, including refining the Organizational Inoculation Index and accounting for AI-crafted phishing threats. The work provides policy-relevant evidence that training alone yields limited risk reduction and that real-world cybersecurity requires integrated, adaptive defenses.

Abstract

Social engineering attacks delivered via email, commonly known as phishing, represent a persistent cybersecurity threat leading to significant organizational incidents and data breaches. Although many organizations train employees on phishing, often mandated by compliance requirements, the real-world effectiveness of this training remains debated. To contribute to evidence-based cybersecurity policy, we conducted a large-scale reproduction study (N = 12,511) at a US-based financial technology firm. Our experimental design refined prior work by comparing training modalities in operational environments, validating NIST's standardized phishing difficulty measurement, and introducing novel organizational-level temporal resilience metrics. Echoing prior work, training interventions showed no significant main effects on click rates (p=0.450) or reporting rates (p=0.417), with negligible effect sizes. However, we found that the NIST Phish Scale predicted user behavior, with click rates increasing from 7.0% for easy lures to 15.0% for hard lures. Our organizational-level resilience result was mixed: 36-55% of campaigns achieved "inoculation" patterns where reports preceded clicks, but training did not significantly improve organizational-level temporal protection. In summary, our results confirm the ineffectiveness of current phishing training approaches while offering a refined study design for future work.

Paper Structure

This paper contains 29 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of our methodology. Our two-factor design randomly assigns different trainings to different subjects, and then randomly sends phishing lures of varying complexity to the subjects. We compared between-subject performance using a mix of standard and novel metrics to test our five hypotheses.
  • Figure 2: Comparison of effect sizes of phishing susceptibility. Bars show the variance explained by phishing difficulty (per the NIST Phish Scale) for each training intervention.
  • Figure 3: Phishing defense architecture, mapping the email pathway from authentication through post-delivery response. Technical controls create multiple interception points prior to human interaction, and human reports provide a feedback opportunity for organizational-level threat mitigation.