Table of Contents
Fetching ...

RDFGraphGen: An RDF Graph Generator based on SHACL Shapes

Milos Jovanovik, Marija Vecovska, Maxime Jakubowski, Katja Hose

TL;DR

RDFGraphGen addresses the need for domain-specific RDF datasets by turning SHACL shapes into generation blueprints to produce synthetic graphs. It is implemented as a Python package and supports a scale-factor $S$ that controls the number of entities per top-level shape, enabling small to large data generation. Key contributions include the first SHACL-based RDF data generator, support for schema.org literals for realism, and a scalable, domain-agnostic workflow with batch and concurrent generation. The tool is open-source under the MIT license, with example use cases and performance validation that demonstrate practical applicability for benchmarking, testing, and model training.

Abstract

Developing and testing modern RDF-based applications often requires access to RDF datasets with certain characteristics. Unfortunately, it is very difficult to publicly find domain-specific knowledge graphs that conform to a particular set of characteristics. Hence, in this paper we propose RDFGraphGen, an open-source RDF graph generator that uses characteristics provided in the form of SHACL (Shapes Constraint Language) shapes to generate synthetic RDF graphs. RDFGraphGen is domain-agnostic, with configurable graph structure, value constraints, and distributions. It also comes with a number of predefined values for popular schema.org classes and properties, for more realistic graphs. Our results show that RDFGraphGen is scalable and can generate small, medium, and large RDF graphs in any domain.

RDFGraphGen: An RDF Graph Generator based on SHACL Shapes

TL;DR

RDFGraphGen addresses the need for domain-specific RDF datasets by turning SHACL shapes into generation blueprints to produce synthetic graphs. It is implemented as a Python package and supports a scale-factor that controls the number of entities per top-level shape, enabling small to large data generation. Key contributions include the first SHACL-based RDF data generator, support for schema.org literals for realism, and a scalable, domain-agnostic workflow with batch and concurrent generation. The tool is open-source under the MIT license, with example use cases and performance validation that demonstrate practical applicability for benchmarking, testing, and model training.

Abstract

Developing and testing modern RDF-based applications often requires access to RDF datasets with certain characteristics. Unfortunately, it is very difficult to publicly find domain-specific knowledge graphs that conform to a particular set of characteristics. Hence, in this paper we propose RDFGraphGen, an open-source RDF graph generator that uses characteristics provided in the form of SHACL (Shapes Constraint Language) shapes to generate synthetic RDF graphs. RDFGraphGen is domain-agnostic, with configurable graph structure, value constraints, and distributions. It also comes with a number of predefined values for popular schema.org classes and properties, for more realistic graphs. Our results show that RDFGraphGen is scalable and can generate small, medium, and large RDF graphs in any domain.
Paper Structure (14 sections, 2 figures, 1 table, 3 algorithms)

This paper contains 14 sections, 2 figures, 1 table, 3 algorithms.

Figures (2)

  • Figure 1: RDFGraphGen Workflow Diagram
  • Figure 2: Generation Time based on the Number of Generated RDF Triples

Theorems & Definitions (4)

  • Example 1
  • Example 2
  • Example 3
  • Example 4