Table of Contents
Fetching ...

LLM-Assisted Proactive Threat Intelligence for Automated Reasoning

Shuva Paul, Farhad Alemi, Richard Macwan

TL;DR

The paper tackles the challenge of real-time cyber threat defense amidst rapidly evolving threats by marrying large language models (GPT-4o) with Retrieval-Augmented Generation (RAG) and continuous threat intelligence feeds. It presents a modular framework that ingests CVE, CWE, EPSS, and KEV data via the Patrowl platform, encodes threat artifacts with all-mpnet-base-v2 embeddings stored in Milvus, and generates contextual threat analyses through GPT-4o guided by LangChain. Through case studies on recently disclosed CVEs, KEVs, and high-EPSS CVEs, the approach outperforms vanilla GPT-4o in accuracy and timeliness, demonstrating gains in real-time threat detection and decision support. The work establishes a robust foundation for automated intelligent threat information management and highlights avenues for future enhancements, including multi-agent orchestration and broader threat intel integrations for proactive cybersecurity defense.

Abstract

Successful defense against dynamically evolving cyber threats requires advanced and sophisticated techniques. This research presents a novel approach to enhance real-time cybersecurity threat detection and response by integrating large language models (LLMs) and Retrieval-Augmented Generation (RAG) systems with continuous threat intelligence feeds. Leveraging recent advancements in LLMs, specifically GPT-4o, and the innovative application of RAG techniques, our approach addresses the limitations of traditional static threat analysis by incorporating dynamic, real-time data sources. We leveraged RAG to get the latest information in real-time for threat intelligence, which is not possible in the existing GPT-4o model. We employ the Patrowl framework to automate the retrieval of diverse cybersecurity threat intelligence feeds, including Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Exploit Prediction Scoring System (EPSS), and Known Exploited Vulnerabilities (KEV) databases, and integrate these with the all-mpnet-base-v2 model for high-dimensional vector embeddings, stored and queried in Milvus. We demonstrate our system's efficacy through a series of case studies, revealing significant improvements in addressing recently disclosed vulnerabilities, KEVs, and high-EPSS-score CVEs compared to the baseline GPT-4o. This work not only advances the role of LLMs in cybersecurity but also establishes a robust foundation for the development of automated intelligent cyberthreat information management systems, addressing crucial gaps in current cybersecurity practices.

LLM-Assisted Proactive Threat Intelligence for Automated Reasoning

TL;DR

The paper tackles the challenge of real-time cyber threat defense amidst rapidly evolving threats by marrying large language models (GPT-4o) with Retrieval-Augmented Generation (RAG) and continuous threat intelligence feeds. It presents a modular framework that ingests CVE, CWE, EPSS, and KEV data via the Patrowl platform, encodes threat artifacts with all-mpnet-base-v2 embeddings stored in Milvus, and generates contextual threat analyses through GPT-4o guided by LangChain. Through case studies on recently disclosed CVEs, KEVs, and high-EPSS CVEs, the approach outperforms vanilla GPT-4o in accuracy and timeliness, demonstrating gains in real-time threat detection and decision support. The work establishes a robust foundation for automated intelligent threat information management and highlights avenues for future enhancements, including multi-agent orchestration and broader threat intel integrations for proactive cybersecurity defense.

Abstract

Successful defense against dynamically evolving cyber threats requires advanced and sophisticated techniques. This research presents a novel approach to enhance real-time cybersecurity threat detection and response by integrating large language models (LLMs) and Retrieval-Augmented Generation (RAG) systems with continuous threat intelligence feeds. Leveraging recent advancements in LLMs, specifically GPT-4o, and the innovative application of RAG techniques, our approach addresses the limitations of traditional static threat analysis by incorporating dynamic, real-time data sources. We leveraged RAG to get the latest information in real-time for threat intelligence, which is not possible in the existing GPT-4o model. We employ the Patrowl framework to automate the retrieval of diverse cybersecurity threat intelligence feeds, including Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Exploit Prediction Scoring System (EPSS), and Known Exploited Vulnerabilities (KEV) databases, and integrate these with the all-mpnet-base-v2 model for high-dimensional vector embeddings, stored and queried in Milvus. We demonstrate our system's efficacy through a series of case studies, revealing significant improvements in addressing recently disclosed vulnerabilities, KEVs, and high-EPSS-score CVEs compared to the baseline GPT-4o. This work not only advances the role of LLMs in cybersecurity but also establishes a robust foundation for the development of automated intelligent cyberthreat information management systems, addressing crucial gaps in current cybersecurity practices.

Paper Structure

This paper contains 36 sections, 1 figure.

Figures (1)

  • Figure 1: Threat intelligence and user query flow