How to evaluate NoSQL Database Paradigms for Knowledge Graph Processing

Rosario Napoli; Antonio Celesti; Massimo Villari; Maria Fazio

How to evaluate NoSQL Database Paradigms for Knowledge Graph Processing

Rosario Napoli, Antonio Celesti, Massimo Villari, Maria Fazio

TL;DR

The paper tackles the problem of selecting NoSQL DBMS paradigms for Knowledge Graph processing by introducing a KG-specific benchmarking framework that accounts for scale, connectivity, and semantic richness via $S(KG)=|V|+|E|$, $CD(KG)=|E|/|V|$, and $SR(KG)=D_{types}+H(C)+H(R)$. It conducts a reproducible evaluation on the FAERS KG across three scales using three paradigms (Neo4j, MongoDB, ArangoDB) and a four-tier query workload to identify crossover points and derive evidence-based guidelines for paradigm selection. The results reveal clear trade-offs: document stores excel at simple attribute filtering, multi-model systems balance mixed workloads, and graph-native engines dominate deep traversals in semantically rich, highly connected graphs, with crossover points tied to $SR(KG)$ and $CD(KG)$. The study advances KG infrastructure practice by translating ad-hoc choices into data-driven decisions and suggests avenues for automated, self-adapting storage frameworks guided by KG characteristics and query requirements.

Abstract

Knowledge Graph (KG) processing faces critical infrastructure challenges in selecting optimal NoSQL database paradigms, as traditional performance evaluations rely on static benchmarks that fail to capture the complexity of real-world KG workloads. Although the big data field offers numerous comparative studies, in the KG context DBMS selection remains predominantly ad-hoc, leaving practitioners without systematic guidance for matching storage technologies to specific KG characteristics and query requirements. This paper presents a KG-specific benchmarking framework that employs connectivity density, scale, and introduces a graph-centric metric, namely Semantic Richness (SR), within a four-tier query methodology to reveal performance crossover points across Document-Oriented, Graph, and Multi-Model DBMSs. We conduct an empirical evaluation on the FAERS adverse event KG at three scales, comparing paradigms from simple filtering to deep traversal, and provide metric-driven, evidence-based guidelines for aligning NoSQL paradigm selection with graph size, connectivity, and semantic richness.

How to evaluate NoSQL Database Paradigms for Knowledge Graph Processing

TL;DR

, and

. It conducts a reproducible evaluation on the FAERS KG across three scales using three paradigms (Neo4j, MongoDB, ArangoDB) and a four-tier query workload to identify crossover points and derive evidence-based guidelines for paradigm selection. The results reveal clear trade-offs: document stores excel at simple attribute filtering, multi-model systems balance mixed workloads, and graph-native engines dominate deep traversals in semantically rich, highly connected graphs, with crossover points tied to

and

. The study advances KG infrastructure practice by translating ad-hoc choices into data-driven decisions and suggests avenues for automated, self-adapting storage frameworks guided by KG characteristics and query requirements.

Abstract

Paper Structure (29 sections, 7 equations, 11 figures, 1 table)

This paper contains 29 sections, 7 equations, 11 figures, 1 table.

Introduction
Definitions
Background on NoSQL DBMS for KG Management
Related Work
NoSQL Performance Benchmarking
Knowledge Graph Storage Systems
DBMS Schemas for KG representation
Neo4j: Native Graph Modeling for Knowledge Graphs
MongoDB: Document-Oriented Flexibility for Semi-Structured Knowledge
ArangoDB: Unified Access through a Multi-Model Design
KG Complexity Metrics for NoSQL DBMS Evaluation
EVALUATION METHODOLOGY
Data Preparation
Dataset Scaling Strategy
System Ingestion
...and 14 more sections

Figures (11)

Figure 1: KG Atomic Unit.
Figure 2: Classification of NoSQL paradigms in terms of scalability and complexity.
Figure 3: Schema of the FAERS Knowledge Graph.
Figure 4: Cold‐start performance for Query 1 across systems.
Figure 5: Warm‐start performance for Query 1 across systems.
...and 6 more figures

Theorems & Definitions (7)

definition 1: Knowledge Graph
definition 2: KG atomic unit
definition 3: Cold-Start Condition
definition 4: Hot-Start Condition
definition 5: Scale
definition 6: Connectivity Density
definition 7: Semantic Richness

How to evaluate NoSQL Database Paradigms for Knowledge Graph Processing

TL;DR

Abstract

How to evaluate NoSQL Database Paradigms for Knowledge Graph Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (7)