Table of Contents
Fetching ...

Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025

Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Francesco Balassone, María Sanz-Gómez, Cristóbal Ricardo Veas Chávez, Maite del Mundo de Torres

TL;DR

The paper investigates how a semi-autonomous AI cybersecurity agent (CAI) performs in an OT/ICS-focused Capture the Flag competition (Dragos OT CTF 2025), addressing a critical gap in OT-specific AI evaluation. Using the alias1 model within CAI, the study analyzes 48-hour competition data across time-to-solve, category coverage, and comparisons to elite human teams. Key findings include CAI achieving Rank 1 velocity at 1,846 pts/h, fastest progression to 10K points at 5.42 hours, and solving 32 of 34 challenges before a 24-hour automated pause, finishing sixth with 18,900 points; top human teams solved 33/34. The results demonstrate that AI can match or exceed human performance in early-phase OT incident response, while sustaining multi-day operations remains challenging, underscoring the value of hybrid AI-human SOC architectures and robust governance for OT security.

Abstract

Operational Technology (OT) cybersecurity increasingly relies on rapid response across malware analysis, network forensics, and reverse engineering disciplines. We examine the performance of Cybersecurity AI (CAI), powered by the \texttt{alias1} model, during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams. Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams. CAI reached Rank~1 between competition hours 7.0 and 8.0, crossed 10,000 points at 5.42~hours (1,846~pts/h), and completed 32 of the competition's 34 challenges before automated operations were paused at hour~24 with a final score of 18,900 points (6th place). The top-3 human teams solved 33 of 34 challenges, collectively leaving only the 600-point ``Kiddy Tags -- 1'' unsolved; they were also the only teams to clear the 1,000-point ``Moot Force'' binary. The top-5 human teams averaged 1,347~pts/h to the same milestone, marking a 37\% velocity advantage for CAI. We analyse time-resolved scoring, category coverage, and solve cadence. The evidence indicates that a mission-configured AI agent can match or exceed expert human crews in early-phase OT incident response while remaining subject to practical limits in sustained, multi-day operations.

Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025

TL;DR

The paper investigates how a semi-autonomous AI cybersecurity agent (CAI) performs in an OT/ICS-focused Capture the Flag competition (Dragos OT CTF 2025), addressing a critical gap in OT-specific AI evaluation. Using the alias1 model within CAI, the study analyzes 48-hour competition data across time-to-solve, category coverage, and comparisons to elite human teams. Key findings include CAI achieving Rank 1 velocity at 1,846 pts/h, fastest progression to 10K points at 5.42 hours, and solving 32 of 34 challenges before a 24-hour automated pause, finishing sixth with 18,900 points; top human teams solved 33/34. The results demonstrate that AI can match or exceed human performance in early-phase OT incident response, while sustaining multi-day operations remains challenging, underscoring the value of hybrid AI-human SOC architectures and robust governance for OT security.

Abstract

Operational Technology (OT) cybersecurity increasingly relies on rapid response across malware analysis, network forensics, and reverse engineering disciplines. We examine the performance of Cybersecurity AI (CAI), powered by the \texttt{alias1} model, during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams. Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams. CAI reached Rank~1 between competition hours 7.0 and 8.0, crossed 10,000 points at 5.42~hours (1,846~pts/h), and completed 32 of the competition's 34 challenges before automated operations were paused at hour~24 with a final score of 18,900 points (6th place). The top-3 human teams solved 33 of 34 challenges, collectively leaving only the 600-point ``Kiddy Tags -- 1'' unsolved; they were also the only teams to clear the 1,000-point ``Moot Force'' binary. The top-5 human teams averaged 1,347~pts/h to the same milestone, marking a 37\% velocity advantage for CAI. We analyse time-resolved scoring, category coverage, and solve cadence. The evidence indicates that a mission-configured AI agent can match or exceed expert human crews in early-phase OT incident response while remaining subject to practical limits in sustained, multi-day operations.

Paper Structure

This paper contains 26 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Top-10 trajectories across the 48-hour Dragos OT CTF 2025. CAI (teal) leads the first few hours of the competition (teal shaded band), achieving Rank 1 at hours 7-8, remaining in the top-3 until hour 21 (light teal shaded band), and finishing in the top-10.
  • Figure 2: Snapshot from CAI's semi-autonomous Ecoforest heat pump assessment: the agent recovers exposed .htpasswd credentials, cracks DES hashes, and prepares remote manipulation of heating parameters without human intervention. Full case study available at https://aliasrobotics.com/case-study-ecoforest.php.
  • Figure 3: Benchmarking results of AI-vs-AI agents in attack/defense CTFs scenarios that simulate real-world security operations. CAI using alias1 with the red_teamer and blue_teamer agents achieved state-of-the-art performance with a 2.6$\times$ speedup over the second best agent. Refer to sanzgomez2025cybersecurityaibenchmarkcaibench for more details.
  • Figure 4: Early competition progression: (left) First hour shows CAI reaching 2,100 points (nine solves) within 0.85 hours, while the fastest human teams---Adamastor and OTóż.to---closed at 2,900 points; (right) Two-hour mark shows CAI at 2,900 points, trailing Gr1dGuardi4ns (7,300 points) and Adamastor/TugaPwners (4,900 points) before the agent's later acceleration.
  • Figure 5: Mid-phase acceleration: (left) Three-hour checkpoint shows CAI at 3,300 points while human leaders range between 6,300 and 7,900 points; (right) Five-hour progression shows CAI climbing to 7,900 points while Adamastor and TugaPwners hold 8,300--9,300 points before the agent overtakes them.
  • ...and 7 more figures