Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models
Jiandong Jin, Bowen Tang, Mingxuan Ma, Xiao Liu, Yunfei Wang, Qingnan Lai, Jia Yang, Changling Zhou
TL;DR
Crimson tackles the challenge of turning unstructured vulnerability data into structured, strategic cybersecurity insights by mapping CVEs to MITRE ATT&CK techniques. The framework combines a CVE↔ATT&CK mapping dataset (CVEM), Retrieval-Aware Training (RAT) and RAT-R, and domain-specific embeddings to boost LLM-driven strategic reasoning. A 7B-parameter model fine-tuned with Crimson approaches GPT-4 performance while exhibiting fewer hallucinations and errors, and domain-tuned embeddings significantly improve technique discrimination. The work demonstrates that retrieval-aware training and targeted embedding fine-tuning can yield high-quality, interpretable CVE→ATT&CK mappings, enabling proactive defense with smaller, more efficient models.
Abstract
We introduces Crimson, a system that enhances the strategic reasoning capabilities of Large Language Models (LLMs) within the realm of cybersecurity. By correlating CVEs with MITRE ATT&CK techniques, Crimson advances threat anticipation and strategic defense efforts. Our approach includes defining and evaluating cybersecurity strategic tasks, alongside implementing a comprehensive human-in-the-loop data-synthetic workflow to develop the CVE-to-ATT&CK Mapping (CVEM) dataset. We further enhance LLMs' reasoning abilities through a novel Retrieval-Aware Training (RAT) process and its refined iteration, RAT-R. Our findings demonstrate that an LLM fine-tuned with our techniques, possessing 7 billion parameters, approaches the performance level of GPT-4, showing markedly lower rates of hallucination and errors, and surpassing other models in strategic reasoning tasks. Moreover, domain-specific fine-tuning of embedding models significantly improves performance within cybersecurity contexts, underscoring the efficacy of our methodology. By leveraging Crimson to convert raw vulnerability data into structured and actionable insights, we bolster proactive cybersecurity defenses.
