Table of Contents
Fetching ...

Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education

Chengshuai Zhao, Garima Agrawal, Fan Zhang, Tharindu Kumarage, Zhen Tan, Yuli Deng, Ying-Chih Chen, Huan Liu

TL;DR

CyberRAG addresses the reliability challenges of LLM-based cybersecurity QA by integrating retrieval-augmented generation with ontology-based validation grounded in AISecKG. The two-step approach retrieves validated cybersecurity content and then uses a knowledge-graph ontology to fact-check and constrain final answers, reducing hallucinations and misuse in an educational setting. The framework is evaluated on CyberQ across In-KB, Out-of-KB, and Zero-shot scenarios, showing consistent gains with various backbones and prompting strategies, and ablations confirm the importance of KB coverage and ontology validation. The work demonstrates a path toward safe, interactive AI-assisted cybersecurity education with potential applicability to other domains.

Abstract

Integrating AI into education has the potential to transform the teaching of science and technology courses, particularly in the field of cybersecurity. AI-driven question-answering (QA) systems can actively manage uncertainty in cybersecurity problem-solving, offering interactive, inquiry-based learning experiences. Recently, Large language models (LLMs) have gained prominence in AI-driven QA systems, enabling advanced language understanding and user engagement. However, they face challenges like hallucinations and limited domain-specific knowledge, which reduce their reliability in educational settings. To address these challenges, we propose CyberRAG, an ontology-aware retrieval-augmented generation (RAG) approach for developing a reliable and safe QA system in cybersecurity education. CyberRAG employs a two-step approach: first, it augments the domain-specific knowledge by retrieving validated cybersecurity documents from a knowledge base to enhance the relevance and accuracy of the response. Second, it mitigates hallucinations and misuse by integrating a knowledge graph ontology to validate the final answer. Comprehensive experiments on publicly available datasets reveal that CyberRAG delivers accurate, reliable responses aligned with domain knowledge, demonstrating the potential of AI tools to enhance education.

Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education

TL;DR

CyberRAG addresses the reliability challenges of LLM-based cybersecurity QA by integrating retrieval-augmented generation with ontology-based validation grounded in AISecKG. The two-step approach retrieves validated cybersecurity content and then uses a knowledge-graph ontology to fact-check and constrain final answers, reducing hallucinations and misuse in an educational setting. The framework is evaluated on CyberQ across In-KB, Out-of-KB, and Zero-shot scenarios, showing consistent gains with various backbones and prompting strategies, and ablations confirm the importance of KB coverage and ontology validation. The work demonstrates a path toward safe, interactive AI-assisted cybersecurity education with potential applicability to other domains.

Abstract

Integrating AI into education has the potential to transform the teaching of science and technology courses, particularly in the field of cybersecurity. AI-driven question-answering (QA) systems can actively manage uncertainty in cybersecurity problem-solving, offering interactive, inquiry-based learning experiences. Recently, Large language models (LLMs) have gained prominence in AI-driven QA systems, enabling advanced language understanding and user engagement. However, they face challenges like hallucinations and limited domain-specific knowledge, which reduce their reliability in educational settings. To address these challenges, we propose CyberRAG, an ontology-aware retrieval-augmented generation (RAG) approach for developing a reliable and safe QA system in cybersecurity education. CyberRAG employs a two-step approach: first, it augments the domain-specific knowledge by retrieving validated cybersecurity documents from a knowledge base to enhance the relevance and accuracy of the response. Second, it mitigates hallucinations and misuse by integrating a knowledge graph ontology to validate the final answer. Comprehensive experiments on publicly available datasets reveal that CyberRAG delivers accurate, reliable responses aligned with domain knowledge, demonstrating the potential of AI tools to enhance education.

Paper Structure

This paper contains 25 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of CyberRAG Framework.
  • Figure 2: Knowledge base analysis. Model performance consistently improves as the In-KB ratio increases across different datasets.
  • Figure 3: Ontology validation analysis. The proposed ontology validation component is valid across various dataset ratios.
  • Figure 4: Retriever analysis. Various retrievers achieve similar performances in terms of semantic similarity under various settings.
  • Figure 5: Case Study. The answer validation case study (left) elaborates on how the validation model prevents misuse behaviors. The CyberRAG case study (right) showcases the data flow details.