CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
Minghao Shao, Haoran Xi, Nanda Rani, Meet Udeshi, Venkata Sai Charan Putrevu, Kimberly Milner, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique
TL;DR
CRAKEN tackles the challenge of keeping LLM cybersecurity agents up-to-date and capable of multi-step, knowledge-intensive tasks by introducing a knowledge-based execution framework that integrates a domain knowledge base, Self-RAG, and Graph-RAG. It deploys a planner-executor multi-agent architecture and a retrieval system to ground reasoning in external knowledge, and it demonstrates improved performance on NYU CTF Bench (around $22\%$ solved with Graph-RAG) and expanded MITRE ATT&CK technique coverage (approximately $25$-$30\%$ more techniques) over prior work. The results indicate that knowledge-based execution enhances vulnerability modeling and exploit planning across diverse domains, at a modest increase in compute cost. An open-source dataset of CTF writeups and an extensible architecture are provided to enable embedding new security knowledge into LLM-driven agents.
Abstract
Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.
