Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

Andrew T. Rozema; James C. Davis

Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

Andrew T. Rozema, James C. Davis

TL;DR

This large-scale real-world study (N=12,511) investigates the effectiveness of anti-phishing training and finds no significant main effects on click or report rates from either lecture-based or interactive modalities. The NIST Phish Scale robustly predicted user behavior, with click rates rising from $7.0\%$ for Easy to $15.0\%$ for Hard lures ($F(2,12086)=41.415$, $p<0.001$, $\eta^2=0.007$), validating this standardized measure. A novel Organizational Inoculation Index reveals substantial organizational resilience (36–55% inoculation patterns) that does not depend on individual training efficacy, suggesting collective security behaviors and feedback loops contribute to threat mitigation. The findings argue for a layered defense strategy that combines technical controls with awareness training, and they establish a methodological framework for future work, including refining the Organizational Inoculation Index and accounting for AI-crafted phishing threats. The work provides policy-relevant evidence that training alone yields limited risk reduction and that real-world cybersecurity requires integrated, adaptive defenses.

Abstract

Social engineering attacks delivered via email, commonly known as phishing, represent a persistent cybersecurity threat leading to significant organizational incidents and data breaches. Although many organizations train employees on phishing, often mandated by compliance requirements, the real-world effectiveness of this training remains debated. To contribute to evidence-based cybersecurity policy, we conducted a large-scale reproduction study (N = 12,511) at a US-based financial technology firm. Our experimental design refined prior work by comparing training modalities in operational environments, validating NIST's standardized phishing difficulty measurement, and introducing novel organizational-level temporal resilience metrics. Echoing prior work, training interventions showed no significant main effects on click rates (p=0.450) or reporting rates (p=0.417), with negligible effect sizes. However, we found that the NIST Phish Scale predicted user behavior, with click rates increasing from 7.0% for easy lures to 15.0% for hard lures. Our organizational-level resilience result was mixed: 36-55% of campaigns achieved "inoculation" patterns where reports preceded clicks, but training did not significantly improve organizational-level temporal protection. In summary, our results confirm the ineffectiveness of current phishing training approaches while offering a refined study design for future work.

Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

TL;DR

Abstract

Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)