Table of Contents
Fetching ...

CAI: An Open, Bug Bounty-Ready Cybersecurity AI

Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, María Sanz-Gómez, Lidia Salas Espejo, Martiño Crespo-Álvarez, Francisco Oca-Gonzalez, Francesco Balassone, Alfonso Glera-Picón, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte

TL;DR

CAI formalizes autonomy levels in cybersecurity and delivers an open-source framework for rapid, bug-bounty-ready security testing via specialized AI agents. Through extensive CTF, HTB, live competition, and bug-bounty experiments, CAI demonstrates significant time and cost advantages over humans while exposing gaps in current LLM capabilities for complex exploitation. The work also critiques vendor security claims and explores CAI’s applicability to robotic cybersecurity, advocating transparent benchmarking and broader accessibility. Overall, CAI offers a practical, scalable path to democratize offensive security testing and augment human researchers across organizational sizes.

Abstract

By 2028 most cybersecurity actions will be autonomous, with humans teleoperating. We present the first classification of autonomy levels in cybersecurity and introduce Cybersecurity AI (CAI), an open-source framework that democratizes advanced security testing through specialized AI agents. Through rigorous empirical evaluation, we demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks, solving challenges across diverse categories with significantly greater efficiency -up to 3,600x faster than humans in specific tasks and averaging 11x faster overall. CAI achieved first place among AI teams and secured a top-20 position worldwide in the "AI vs Human" CTF live Challenge, earning a monetary reward of $750. Based on our results, we argue against LLM-vendor claims about limited security capabilities. Beyond cybersecurity competitions, CAI demonstrates real-world effectiveness, reaching top-30 in Spain and top-500 worldwide on Hack The Box within a week, while dramatically reducing security testing costs by an average of 156x. Our framework transcends theoretical benchmarks by enabling non-professionals to discover significant security bugs (CVSS 4.3-7.5) at rates comparable to experts during bug bounty exercises. By combining modular agent design with seamless tool integration and human oversight (HITL), CAI addresses critical market gaps, offering organizations of all sizes access to AI-powered bug bounty security testing previously available only to well-resourced firms -thereby challenging the oligopolistic ecosystem currently dominated by major bug bounty platforms.

CAI: An Open, Bug Bounty-Ready Cybersecurity AI

TL;DR

CAI formalizes autonomy levels in cybersecurity and delivers an open-source framework for rapid, bug-bounty-ready security testing via specialized AI agents. Through extensive CTF, HTB, live competition, and bug-bounty experiments, CAI demonstrates significant time and cost advantages over humans while exposing gaps in current LLM capabilities for complex exploitation. The work also critiques vendor security claims and explores CAI’s applicability to robotic cybersecurity, advocating transparent benchmarking and broader accessibility. Overall, CAI offers a practical, scalable path to democratize offensive security testing and augment human researchers across organizational sizes.

Abstract

By 2028 most cybersecurity actions will be autonomous, with humans teleoperating. We present the first classification of autonomy levels in cybersecurity and introduce Cybersecurity AI (CAI), an open-source framework that democratizes advanced security testing through specialized AI agents. Through rigorous empirical evaluation, we demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks, solving challenges across diverse categories with significantly greater efficiency -up to 3,600x faster than humans in specific tasks and averaging 11x faster overall. CAI achieved first place among AI teams and secured a top-20 position worldwide in the "AI vs Human" CTF live Challenge, earning a monetary reward of $750. Based on our results, we argue against LLM-vendor claims about limited security capabilities. Beyond cybersecurity competitions, CAI demonstrates real-world effectiveness, reaching top-30 in Spain and top-500 worldwide on Hack The Box within a week, while dramatically reducing security testing costs by an average of 156x. Our framework transcends theoretical benchmarks by enabling non-professionals to discover significant security bugs (CVSS 4.3-7.5) at rates comparable to experts during bug bounty exercises. By combining modular agent design with seamless tool integration and human oversight (HITL), CAI addresses critical market gaps, offering organizations of all sizes access to AI-powered bug bounty security testing previously available only to well-resourced firms -thereby challenging the oligopolistic ecosystem currently dominated by major bug bounty platforms.

Paper Structure

This paper contains 23 sections, 20 figures, 6 tables.

Figures (20)

  • Figure 1: CAI performance comparison across different LLM models.
  • Figure 2: CAI conducting a security assessment of a MIR-100 Mobile Industrial Robot through (1) network reconnaissance to locate the robot, (2) testing for default credentials in the web interface, (3) identifying exposed services and software vulnerabilities, and (4) performing digital forensics on the robot's ROS system to discover safety tampering. This demonstrates CAI's ability to identify security vulnerabilities and detect safety-critical incidents in industrial robotics systems.
  • Figure 3: The CAI Architecture showing how core components interact in a cybersecurity workflow. Core components (darker boxes) form the essential framework pillars, while support components (lighter boxes) provide infrastructure. The numbered flow indicators illustrate the typical sequence of operations: 1) Human operators interact with the system through HITL, initiating Patterns for agent coordination; 2-3) Patterns coordinate Agent interactions through Handoffs enabling specialized agent collaboration; 4) Agents leverage LLMs for reasoning about security challenges; 5) Agents execute security actions using Tools for practical tasks; 6-7) Agent and Handoff activities are logged by the Tracing system; 8) Tracing data is available to Extensions for enhanced functionality; 9) Tool execution results are returned to Agents for further reasoning and action.
  • Figure 4: CAI performing a complete Hackableii machine from VulnHub through (1) initial reconnaissance, (2) gaining remote code execution via a web shell, (3) discovering and cracking password hashes, and (4) privilege escalation to root. This demonstrates how CAI's methodical approach can solve complex security challenges by leveraging multiple attack vectors.
  • Figure 5: Specialized Cybersecurity Agent Patterns in CAI: Red Team Agent (left) focused on offensive security, Bug Bounty Hunter (middle) specialized in web application vulnerability discovery, and Blue Team Agent (right) dedicated to defensive security. Each agent uses similar core tool architecture but with objectives and methodologies tailored to their specific security roles.
  • ...and 15 more figures