A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution
Sampath Rajapaksha, Ruby Rani, Erisa Karafili
TL;DR
The paper addresses the need for reliable, up-to-date information to support cyber-attack investigation and attribution, where large language models alone risk hallucination and outdated knowledge. It proposes the first Retrieval Augmented Generation (RAG) based QA system that uses the AttackER knowledge base to answer investigator questions with verifiable sources. The authors construct a domain-specific KB, generate QA pairs, implement a RAG-based QA app, and show that few-shot prompting and source-backed answers improve reliability, outperforming GPT-3.5 and GPT-4o in several metrics. They establish a practical, deployable QA solution for cyber forensics and outline future work on continuous KB updating and retrieval enhancements to further improve accuracy and latency.
Abstract
In the constantly evolving field of cybersecurity, it is imperative for analysts to stay abreast of the latest attack trends and pertinent information that aids in the investigation and attribution of cyber-attacks. In this work, we introduce the first question-answering (QA) model and its application that provides information to the cybersecurity experts about cyber-attacks investigations and attribution. Our QA model is based on Retrieval Augmented Generation (RAG) techniques together with a Large Language Model (LLM) and provides answers to the users' queries based on either our knowledge base (KB) that contains curated information about cyber-attacks investigations and attribution or on outside resources provided by the users. We have tested and evaluated our QA model with various types of questions, including KB-based, metadata-based, specific documents from the KB, and external sources-based questions. We compared the answers for KB-based questions with those from OpenAI's GPT-3.5 and the latest GPT-4o LLMs. Our proposed QA model outperforms OpenAI's GPT models by providing the source of the answers and overcoming the hallucination limitations of the GPT models, which is critical for cyber-attack investigation and attribution. Additionally, our analysis showed that when the RAG QA model is given few-shot examples rather than zero-shot instructions, it generates better answers compared to cases where no examples are supplied in addition to the query.
