Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education
Chengshuai Zhao, Garima Agrawal, Fan Zhang, Tharindu Kumarage, Zhen Tan, Yuli Deng, Ying-Chih Chen, Huan Liu
TL;DR
CyberRAG addresses the reliability challenges of LLM-based cybersecurity QA by integrating retrieval-augmented generation with ontology-based validation grounded in AISecKG. The two-step approach retrieves validated cybersecurity content and then uses a knowledge-graph ontology to fact-check and constrain final answers, reducing hallucinations and misuse in an educational setting. The framework is evaluated on CyberQ across In-KB, Out-of-KB, and Zero-shot scenarios, showing consistent gains with various backbones and prompting strategies, and ablations confirm the importance of KB coverage and ontology validation. The work demonstrates a path toward safe, interactive AI-assisted cybersecurity education with potential applicability to other domains.
Abstract
Integrating AI into education has the potential to transform the teaching of science and technology courses, particularly in the field of cybersecurity. AI-driven question-answering (QA) systems can actively manage uncertainty in cybersecurity problem-solving, offering interactive, inquiry-based learning experiences. Recently, Large language models (LLMs) have gained prominence in AI-driven QA systems, enabling advanced language understanding and user engagement. However, they face challenges like hallucinations and limited domain-specific knowledge, which reduce their reliability in educational settings. To address these challenges, we propose CyberRAG, an ontology-aware retrieval-augmented generation (RAG) approach for developing a reliable and safe QA system in cybersecurity education. CyberRAG employs a two-step approach: first, it augments the domain-specific knowledge by retrieving validated cybersecurity documents from a knowledge base to enhance the relevance and accuracy of the response. Second, it mitigates hallucinations and misuse by integrating a knowledge graph ontology to validate the final answer. Comprehensive experiments on publicly available datasets reveal that CyberRAG delivers accurate, reliable responses aligned with domain knowledge, demonstrating the potential of AI tools to enhance education.
