Table of Contents
Fetching ...

Streamlining Knowledge Graph Creation with PyRML

Andrea Giovanni Nuzzolese

TL;DR

This paper introduces PyRML, a Python-native engine for constructing Knowledge Graphs from heterogeneous data sources using declarative RML mappings. It details a four-module architecture (API, Core, Functions, Mapper) that provides a programmable interface, FnO function support, and a Pandas/RDFlib-integrated data flow, enabling interactive KG engineering within Python workflows. The evaluation demonstrates strong RML-Core conformance across multiple data sources and consistently faster performance than the RMLMapper engine, highlighting PyRML’s suitability for latency-sensitive and reproducible data integration. The work addresses gaps in existing tools by offering a modular, extensible, and open-source solution tightly integrated with the Python data ecosystem, with future directions toward RML extensions and LLM-assisted mapping workflows.

Abstract

Knowledge Graphs (KGs) are increasingly adopted as a foundational technology for integrating heterogeneous data in domains such as climate science, cultural heritage, and the life sciences. Declarative mapping languages like R2RML and RML have played a central role in enabling scalable and reusable KG construction, offering a transparent means of transforming structured and semi-structured data into RDF. In this paper, we present PyRML, a lightweight, Python-native library for building Knowledge Graphs through declarative mappings. PyRML supports core RML constructs and provides a programmable interface for authoring, executing, and testing mappings directly within Python environments. It integrates with popular data and semantic web libraries (e.g., Pandas and RDFlib), enabling transparent and modular workflows. By lowering the barrier to entry for KG creation and fostering reproducible, ontology-aligned data integration, PyRML bridges the gap between declarative semantics and practical KG engineering.

Streamlining Knowledge Graph Creation with PyRML

TL;DR

This paper introduces PyRML, a Python-native engine for constructing Knowledge Graphs from heterogeneous data sources using declarative RML mappings. It details a four-module architecture (API, Core, Functions, Mapper) that provides a programmable interface, FnO function support, and a Pandas/RDFlib-integrated data flow, enabling interactive KG engineering within Python workflows. The evaluation demonstrates strong RML-Core conformance across multiple data sources and consistently faster performance than the RMLMapper engine, highlighting PyRML’s suitability for latency-sensitive and reproducible data integration. The work addresses gaps in existing tools by offering a modular, extensible, and open-source solution tightly integrated with the Python data ecosystem, with future directions toward RML extensions and LLM-assisted mapping workflows.

Abstract

Knowledge Graphs (KGs) are increasingly adopted as a foundational technology for integrating heterogeneous data in domains such as climate science, cultural heritage, and the life sciences. Declarative mapping languages like R2RML and RML have played a central role in enabling scalable and reusable KG construction, offering a transparent means of transforming structured and semi-structured data into RDF. In this paper, we present PyRML, a lightweight, Python-native library for building Knowledge Graphs through declarative mappings. PyRML supports core RML constructs and provides a programmable interface for authoring, executing, and testing mappings directly within Python environments. It integrates with popular data and semantic web libraries (e.g., Pandas and RDFlib), enabling transparent and modular workflows. By lowering the barrier to entry for KG creation and fostering reproducible, ontology-aligned data integration, PyRML bridges the gap between declarative semantics and practical KG engineering.

Paper Structure

This paper contains 21 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The architecture of PyRML
  • Figure 2: Comparison of PyRML with RMLMapper respect to execution time of test cases expressed in seconds.