OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models
Kartik Sharma, Peeyush Kumar, Yunqing Li
TL;DR
OG-RAG addresses the challenge of domain adaptation for large language models by grounding retrieval in domain ontologies and encoding facts as a hypergraph of ontology-grounded blocks. A greedy retrieval selects a compact set of hyperedges to form context, improving fact recall, correctness, and context attribution across agriculture and news tasks and multiple LLMs. The approach reduces hallucinations and supports deductive reasoning by anchoring unstructured text to structured domain knowledge, with competitive preprocessing and query-time efficiency. This has practical impact for industrial workflows and knowledge-driven tasks by enabling reliable, explainable LLM outputs with traceable context.
Abstract
This paper presents OG-RAG, an Ontology-Grounded Retrieval Augmented Generation method designed to enhance LLM-generated responses by anchoring retrieval processes in domain-specific ontologies. While LLMs are widely used for tasks like question answering and search, they struggle to adapt to specialized knowledge, such as industrial workflows or knowledge work, without expensive fine-tuning or sub-optimal retrieval methods. Existing retrieval-augmented models, such as RAG, offer improvements but fail to account for structured domain knowledge, leading to suboptimal context generation. Ontologies, which conceptually organize domain knowledge by defining entities and their interrelationships, offer a structured representation to address this gap. OG-RAG constructs a hypergraph representation of domain documents, where each hyperedge encapsulates clusters of factual knowledge grounded using domain-specific ontology. An optimization algorithm then retrieves the minimal set of hyperedges that constructs a precise, conceptually grounded context for the LLM. This method enables efficient retrieval while preserving the complex relationships between entities. OG-RAG applies to domains where fact-based reasoning is essential, particularly in tasks that require workflows or decision-making steps to follow predefined rules and procedures. These include industrial workflows in healthcare, legal, and agricultural sectors, as well as knowledge-driven tasks such as news journalism, investigative research, consulting and more. Our evaluations demonstrate that OG-RAG increases the recall of accurate facts by 55% and improves response correctness by 40% across four different LLMs. Additionally, OG-RAG enables 30% faster attribution of responses to context and boosts fact-based reasoning accuracy by 27% compared to baseline methods.
