Table of Contents
Fetching ...

Randomized Controlled Trials for Phishing Triage Agent

James Bono

TL;DR

Security operations centers face high-volume phishing triage tasks requiring fast and accurate decisions. This study conducts a field randomized controlled trial to evaluate a domain-specific AI agent, the Microsoft Security Copilot Phishing Triage Agent, across productivity, protection, and analyst behavior. Key findings show up to 6.5x increases in true positives per minute and a 77% uplift in F1 under corpus ground truth, with substantial gains largely driven by queue prioritization and the resolve-benign protocol; analysts also reallocate attention to malicious emails rather than rubber-stamping agent verdicts. The results support adopting AI-assisted triage in SOCs to reallocate scarce human resources effectively, while highlighting design choices around queue management and exposure of agent rationale; gains are expected to be conservative, with further improvements anticipated as technology matures.

Abstract

Security operations centers (SOCs) face a persistent challenge: efficiently triaging a high volume of user-reported phishing emails while maintaining robust protection against threats. This paper presents the first randomized controlled trial (RCT) evaluating the impact of a domain-specific AI agent - the Microsoft Security Copilot Phishing Triage Agent - on analyst productivity and accuracy. Our results demonstrate that agent-augmented analysts achieved up to 6.5 times as many true positives per analyst minute and a 77% improvement in verdict accuracy compared to a control group. The agent's queue prioritization and verdict explanations were both significant drivers of efficiency. Behavioral analysis revealed that agent-augmented analysts reallocated their attention, spending 53% more time on malicious emails, and were not prone to rubber-stamping the agent's malicious verdicts. These findings offer actionable insights for SOC leaders considering AI adoption, including the potential for agents to fundamentally change the optimal allocation of SOC resources.

Randomized Controlled Trials for Phishing Triage Agent

TL;DR

Security operations centers face high-volume phishing triage tasks requiring fast and accurate decisions. This study conducts a field randomized controlled trial to evaluate a domain-specific AI agent, the Microsoft Security Copilot Phishing Triage Agent, across productivity, protection, and analyst behavior. Key findings show up to 6.5x increases in true positives per minute and a 77% uplift in F1 under corpus ground truth, with substantial gains largely driven by queue prioritization and the resolve-benign protocol; analysts also reallocate attention to malicious emails rather than rubber-stamping agent verdicts. The results support adopting AI-assisted triage in SOCs to reallocate scarce human resources effectively, while highlighting design choices around queue management and exposure of agent rationale; gains are expected to be conservative, with further improvements anticipated as technology matures.

Abstract

Security operations centers (SOCs) face a persistent challenge: efficiently triaging a high volume of user-reported phishing emails while maintaining robust protection against threats. This paper presents the first randomized controlled trial (RCT) evaluating the impact of a domain-specific AI agent - the Microsoft Security Copilot Phishing Triage Agent - on analyst productivity and accuracy. Our results demonstrate that agent-augmented analysts achieved up to 6.5 times as many true positives per analyst minute and a 77% improvement in verdict accuracy compared to a control group. The agent's queue prioritization and verdict explanations were both significant drivers of efficiency. Behavioral analysis revealed that agent-augmented analysts reallocated their attention, spending 53% more time on malicious emails, and were not prone to rubber-stamping the agent's malicious verdicts. These findings offer actionable insights for SOC leaders considering AI adoption, including the potential for agents to fundamentally change the optimal allocation of SOC resources.

Paper Structure

This paper contains 15 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Factor Increase in TPs per unit of Analyst Time
  • Figure 2: The task page for the Blind and Control groups.
  • Figure 3: The task page for the Aware group, including the agent verdict.
  • Figure 4: The base directory of the OneDrive sample repository.
  • Figure 5: The contents of the sample folder, including artifacts and, only for the Aware group, agent output.