The CTU Prague Relational Learning Repository
Jan Motl, Oliver Schulte
TL;DR
The paper introduces the Prague Relational Learning Repository (PRLR), a public collection of multi-relational datasets hosted on a MySQL server to advance relational learning research. It argues for SQL-based storage to enable cross-system integration and provides metadata in a central meta-database along with conversion tools to other relational-learning formats, supporting supervised learning on complex schemas. The repository currently contains 148 databases, spanning real and synthetic benchmarks, with rich metadata on schema, statistics, key structure, and classification targets to facilitate dataset selection and benchmarking. Access is read-only by default, with a straightforward path for community contributions via data dumps or migration access, ensuring scalable growth. Overall, PRLR aims to standardize and accelerate research in relational data mining by offering diverse, well-documented benchmarks and easy tooling for interoperability.
Abstract
The aim of the Prague Relational Learning Repository is to support machine learning research with multi-relational data. The repository currently contains 148 SQL databases hosted on a public MySQL server located at https://relational.fel.cvut.cz. The server is provided by the Czech Technical University (CTU). A searchable meta-database provides metadata (e.g., the number of tables in the database, the number of rows and columns in the tables, the number of self-relationships).
