Table of Contents
Fetching ...

Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

Tiago da Cruz, Bernardo Tavares, Francisco Belo

TL;DR

This work analyzes how knowledge representation in retrieval-augmented generation (RAG) systems affects performance, by comparing standard Vector RAG, GraphRAG, and ontology-guided knowledge graphs built from relational databases or text. The methodology introduces a unified KG-construction framework with a custom retriever that extracts connected subgraphs, enabling richer context through chunk-aware graphs. Empirical results show that GraphRAG and ontology-guided KGs incorporating textual chunks achieve high accuracy (up to 18/20) and can rival state-of-the-art methods, with relational-database derived ontologies offering similar performance to text-derived ones at lower cost and maintenance burden. The findings advocate practical, scalable strategies for building interpretable graph-based RAG pipelines, particularly in domains with stable structured data, though broader validation is needed across datasets and tasks.

Abstract

Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.

Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

TL;DR

This work analyzes how knowledge representation in retrieval-augmented generation (RAG) systems affects performance, by comparing standard Vector RAG, GraphRAG, and ontology-guided knowledge graphs built from relational databases or text. The methodology introduces a unified KG-construction framework with a custom retriever that extracts connected subgraphs, enabling richer context through chunk-aware graphs. Empirical results show that GraphRAG and ontology-guided KGs incorporating textual chunks achieve high accuracy (up to 18/20) and can rival state-of-the-art methods, with relational-database derived ontologies offering similar performance to text-derived ones at lower cost and maintenance burden. The findings advocate practical, scalable strategies for building interpretable graph-based RAG pipelines, particularly in domains with stable structured data, though broader validation is needed across datasets and tasks.

Abstract

Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.

Paper Structure

This paper contains 21 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Experimental setup comparing Vector RAG, GraphRAG, and ontology-guided Knowledge Graphs.
  • Figure 2: Overview of the RIGOR Pipeline implemented.
  • Figure 3: Custom Retriever Pipeline.
  • Figure 4: Overview of results
  • Figure 5: RDB ontology aligned KG
  • ...and 3 more figures