Cyber Knowledge Completion Using Large Language Models

Braden K Webb; Sumit Purohit; Rounak Meyur

Cyber Knowledge Completion Using Large Language Models

Braden K Webb, Sumit Purohit, Rounak Meyur

TL;DR

This work applies embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings and proposes a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns.

Abstract

The integration of the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge. This highlights the urgent need for better-informed risk assessments and mitigation strategies. While previous efforts have relied on rule-based natural language processing (NLP) tools to map vulnerabilities, weaknesses, and attack patterns, recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion through improved reasoning, inference, and summarization capabilities. We apply embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings. Additionally, we propose a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns. Further, we use a small hand-labeled dataset to compare the proposed RAG-based approach to a baseline standard binary classification model. Thus, the proposed approach provides a comprehensive framework to address the challenge of cyber-attack knowledge graph completion.

Cyber Knowledge Completion Using Large Language Models

TL;DR

Abstract

Paper Structure (10 sections, 8 equations, 2 figures, 3 tables)

This paper contains 10 sections, 8 equations, 2 figures, 3 tables.

Introduction
Related Work
Methods
Preliminaries
Embedding Generation
Mapping Generation
Evaluation
Metric Definitions
Results
Conclusion

Figures (2)

Figure 1: Examples of description strings for a CAPEC attack pattern capec and an ATT&CK ICS technique mitre which describe very similar adversarial behavior. A good framework should generate a mapping between these two documents.
Figure 2: The nearest-neighbor and RAG pipelines for cyber attack knowledge graph completion, shown in the CAPEC-to-ATT&CK direction. Modules in yellow are common to both the nearest-neighbor and RAG pipelines, while those in red are unique to the RAG-based approach.

Cyber Knowledge Completion Using Large Language Models

TL;DR

Abstract

Cyber Knowledge Completion Using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)