STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone; Samuel Jacob Chacko; Dawson Stoller; Ryan Schmidt; Xiuwen Liu

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt, Xiuwen Liu

Abstract

Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing research often relies on static benchmarks that fail to capture the dynamic nature of real-world vulnerabilities. In this work, we introduce STRIATUM-CTF (A Search-based Test-time Reasoning Inference Agent for Tactical Utility Maximization in Cybersecurity), a modular agentic framework built upon the Model Context Protocol (MCP). By standardizing tool interfaces for system introspection, decompilation, and runtime debugging, STRIATUM-CTF enables the agent to maintain a coherent context window across extended exploit trajectories. We validate this approach not merely on synthetic datasets, but in a live competitive environment. Our system participated in a university-hosted Capture-the-Flag (CTF) competition in late 2025, where it operated autonomously to identify and exploit vulnerabilities in real-time. STRIATUM-CTF secured First Place, outperforming 21 human teams and demonstrating strong adaptability in a dynamic problem-solving setting. We analyze the agent's decision-making logs to show how MCP-based tool abstraction significantly reduces hallucination compared to naive prompting strategies. These results suggest that standardized context protocols are a critical path toward robust autonomous cyber-reasoning systems.

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

Abstract

Paper Structure (22 sections, 1 equation, 7 figures, 2 tables)

This paper contains 22 sections, 1 equation, 7 figures, 2 tables.

Introduction
Related Work
Methodology
Formal Problem Formulation
High-Level Neuro-Symbolic Architecture
The MCP Toolbox
Algorithmic Lifecycle
Experimental Setting and Results
Dataset, LLM Models, and Implementation Details
Model Configuration
Ablation Conditions
Implementation Environment
Evaluation Protocol
Benchmark Evaluation Results and Analysis
Overall Performance
...and 7 more sections

Figures (7)

Figure 1: High-Level Neuro-Symbolic Architecture. The system decouples probabilistic reasoning from deterministic execution. The Reasoning Layer (Left) acts as the strategic planner, emitting JSON payloads that must pass through the Protocol Layer (Center). This symbolic interface enforces strict schema validation, effectively filtering out hallucinated commands, before invoking tools in the Execution Layer (Right). The resulting system feedback (stdout/stderr) is structurally parsed and re-injected into the context window, grounding the agent's latent state in verifiable reality.
Figure 2: STRIATUM-CTF Execution Workflow: A sequence trace showing the transition from User Input to Flag Capture. The diagram highlights the system's error-recovery capability: when the agent attempts an invalid tool configuration (Phase 2), the Protocol Layer enforces schema compliance, triggering an autonomous correction cycle that enables the final successful exploitation (Phase 3).
Figure 3: Success rates of different settings with the 95% Wilson Score confidence interval.
Figure 4: Distribution of the time taken to solve CTF problems under different settings.
Figure 5: Time taken of individual runs under different settings.
...and 2 more figures

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

Abstract

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

Authors

Abstract

Table of Contents

Figures (7)