Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025
Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Francesco Balassone, María Sanz-Gómez, Cristóbal Ricardo Veas Chávez, Maite del Mundo de Torres
TL;DR
The paper investigates how a semi-autonomous AI cybersecurity agent (CAI) performs in an OT/ICS-focused Capture the Flag competition (Dragos OT CTF 2025), addressing a critical gap in OT-specific AI evaluation. Using the alias1 model within CAI, the study analyzes 48-hour competition data across time-to-solve, category coverage, and comparisons to elite human teams. Key findings include CAI achieving Rank 1 velocity at 1,846 pts/h, fastest progression to 10K points at 5.42 hours, and solving 32 of 34 challenges before a 24-hour automated pause, finishing sixth with 18,900 points; top human teams solved 33/34. The results demonstrate that AI can match or exceed human performance in early-phase OT incident response, while sustaining multi-day operations remains challenging, underscoring the value of hybrid AI-human SOC architectures and robust governance for OT security.
Abstract
Operational Technology (OT) cybersecurity increasingly relies on rapid response across malware analysis, network forensics, and reverse engineering disciplines. We examine the performance of Cybersecurity AI (CAI), powered by the \texttt{alias1} model, during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams. Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams. CAI reached Rank~1 between competition hours 7.0 and 8.0, crossed 10,000 points at 5.42~hours (1,846~pts/h), and completed 32 of the competition's 34 challenges before automated operations were paused at hour~24 with a final score of 18,900 points (6th place). The top-3 human teams solved 33 of 34 challenges, collectively leaving only the 600-point ``Kiddy Tags -- 1'' unsolved; they were also the only teams to clear the 1,000-point ``Moot Force'' binary. The top-5 human teams averaged 1,347~pts/h to the same milestone, marking a 37\% velocity advantage for CAI. We analyse time-resolved scoring, category coverage, and solve cadence. The evidence indicates that a mission-configured AI agent can match or exceed expert human crews in early-phase OT incident response while remaining subject to practical limits in sustained, multi-day operations.
