AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning

Michael Färber; David Lamprecht; Yuni Susanti

AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning

Michael Färber, David Lamprecht, Yuni Susanti

TL;DR

This work tackles the gap between RDF semantics and graph machine learning by introducing AutoRDF2GML, a framework that semi-automatically converts RDF data into ready-to-use heterogeneous graph datasets. It supports both content-based features derived from RDF datatype properties and topology-based features from RDF object properties, enabling diverse ML tasks such as link prediction and node classification. The authors also present new RDF-based benchmarks (SOA-SW, LPWC, and additional datasets) to enable rigorous evaluation of GML approaches on semantic graphs. The framework is designed for accessibility via a single-file configuration and pip installation, bridging the Semantic Web and Graph ML communities and facilitating scalable, RDF-based ML applications.

Abstract

In this paper, we introduce AutoRDF2GML, a framework designed to convert RDF data into data representations tailored for graph machine learning tasks. AutoRDF2GML enables, for the first time, the creation of both content-based features -- i.e., features based on RDF datatype properties -- and topology-based features -- i.e., features based on RDF object properties. Characterized by automated feature extraction, AutoRDF2GML makes it possible even for users less familiar with RDF and SPARQL to generate data representations ready for graph machine learning tasks, such as link prediction, node classification, and graph classification. Furthermore, we present four new benchmark datasets for graph machine learning, created from large RDF knowledge graphs using our framework. These datasets serve as valuable resources for evaluating graph machine learning approaches, such as graph neural networks. Overall, our framework effectively bridges the gap between the Graph Machine Learning and Semantic Web communities, paving the way for RDF-based machine learning applications.

AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning

TL;DR

Abstract

Paper Structure (16 sections, 5 figures, 6 tables)

This paper contains 16 sections, 5 figures, 6 tables.

Introduction
Related Work
Propositionalization of RDF Data
Knowledge Graph Embeddings.
Heterogeneous Graph Benchmarks
AutoRDF2GML
Automatic Generation of Nodes and Node Features
3.1.1 Content-based Node Features.
3.1.2 Topology-based Node Features.
Automatic Integration of Edges and Edge Features
Semantic Graph Machine Learning Benchmarks
SemOpenAlex-SemanticWeb (SOA-SW)
Linked Papers With Code (LPWC)
Further Benchmark Datasets
Applications and Use Cases
...and 1 more sections

Figures (5)

Figure 1: Overview of AutoRDF2GML.
Figure 2: Example n-ary relation.
Figure 3: Example multi-hop relation from Linked Papers With Code.
Figure 4: Overview heterogeneous graph datasest SOA-SW.
Figure 5: Overview heterogeneous graph datasest LPWC.

AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning

TL;DR

Abstract

AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)