Table of Contents
Fetching ...

A Framework for Evaluating Emerging Cyberattack Capabilities of AI

Mikel Rodriguez, Raluca Ada Popa, Four Flynn, Lihao Liang, Allan Dafoe, Anna Wang

TL;DR

This work presents a structured framework for evaluating frontier AI’s cyberattack capabilities across the full attack chain, grounding analysis in real-world AI misuse data. It combines a bottleneck-driven uplift assessment with a targetable evaluation approach, using Stage-wise construction from a curated attack-basket to cost-differential scoring to guide defenses. Key contributions include a representative-attack-basket, bottleneck methodology, targeted cybersecurity model evaluations, and a defense-centric benchmark that informs red-teaming and mitigation prioritization. The framework aims to keep defense insights current as AI capabilities evolve toward AGI, emphasizing practical translation from capability scores to actionable defense actions.

Abstract

As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of Artificial General Intelligence (AGI). Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes. Through a bottleneck analysis on these archetypes, we pinpointed phases most susceptible to AI-driven disruption. We then identified and utilized externally developed cybersecurity model evaluations focused on these critical phases. We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.

A Framework for Evaluating Emerging Cyberattack Capabilities of AI

TL;DR

This work presents a structured framework for evaluating frontier AI’s cyberattack capabilities across the full attack chain, grounding analysis in real-world AI misuse data. It combines a bottleneck-driven uplift assessment with a targetable evaluation approach, using Stage-wise construction from a curated attack-basket to cost-differential scoring to guide defenses. Key contributions include a representative-attack-basket, bottleneck methodology, targeted cybersecurity model evaluations, and a defense-centric benchmark that informs red-teaming and mitigation prioritization. The framework aims to keep defense insights current as AI capabilities evolve toward AGI, emphasizing practical translation from capability scores to actionable defense actions.

Abstract

As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of Artificial General Intelligence (AGI). Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes. Through a bottleneck analysis on these archetypes, we pinpointed phases most susceptible to AI-driven disruption. We then identified and utilized externally developed cybersecurity model evaluations focused on these critical phases. We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.

Paper Structure

This paper contains 32 sections, 15 figures.

Figures (15)

  • Figure 1: The Cyberattack Chain framework outlines typical cyberattack stages, offering a structured approach to analyze threats, prioritize actions, and develop defenses.
  • Figure 2: Mapping potential AI-enabled cost reductions to specific attack phases provides decision-relevant insights for defenders.
  • Figure 3: Frontier AI safety evaluations reveal cyber capabilities, but translating these findings into practical defense strategies remains challenging.
  • Figure 4: Overview of our proposed evaluation framework approach.
  • Figure 5: Observed instances of AI use across various attack chain phases.
  • ...and 10 more figures