AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning
Michael Färber, David Lamprecht, Yuni Susanti
TL;DR
This work tackles the gap between RDF semantics and graph machine learning by introducing AutoRDF2GML, a framework that semi-automatically converts RDF data into ready-to-use heterogeneous graph datasets. It supports both content-based features derived from RDF datatype properties and topology-based features from RDF object properties, enabling diverse ML tasks such as link prediction and node classification. The authors also present new RDF-based benchmarks (SOA-SW, LPWC, and additional datasets) to enable rigorous evaluation of GML approaches on semantic graphs. The framework is designed for accessibility via a single-file configuration and pip installation, bridging the Semantic Web and Graph ML communities and facilitating scalable, RDF-based ML applications.
Abstract
In this paper, we introduce AutoRDF2GML, a framework designed to convert RDF data into data representations tailored for graph machine learning tasks. AutoRDF2GML enables, for the first time, the creation of both content-based features -- i.e., features based on RDF datatype properties -- and topology-based features -- i.e., features based on RDF object properties. Characterized by automated feature extraction, AutoRDF2GML makes it possible even for users less familiar with RDF and SPARQL to generate data representations ready for graph machine learning tasks, such as link prediction, node classification, and graph classification. Furthermore, we present four new benchmark datasets for graph machine learning, created from large RDF knowledge graphs using our framework. These datasets serve as valuable resources for evaluating graph machine learning approaches, such as graph neural networks. Overall, our framework effectively bridges the gap between the Graph Machine Learning and Semantic Web communities, paving the way for RDF-based machine learning applications.
