Common Foundations for SHACL, ShEx, and PG-Schema
S. Ahmetaj, I. Boneva, J. Hidders, K. Hose, M. Jakubowski, J. E. Labra-Gayo, W. Martens, F. Mogavero, F. Murlak, C. Okulmus, A. Polleres, O. Savkovic, M. Simkus, D. Tomaszuk
TL;DR
This paper develops a unified framework to compare SHACL, ShEx, and PG-Schema by introducing a Common Graph Data Model that embeds RDF and Property Graphs as a shared substrate. It formalizes non-recursive core components of each language, defines a Common Graph Schema Language (CoGSL) to capture shared functionalities, and discusses translations between the formalisms to enable interoperability. The work covers detailed foundations, per-language treatments on common graphs, and an extensive related-work survey, ultimately enabling cross-translation and unified understanding across graph-schema technologies. The practical impact is a principled basis for interoperable graph validation and schema design across heterogeneous graph data models, with future directions toward recursion in ShEx and richer PG-Schema capabilities.
Abstract
Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of applications and availability of different data models, such as RDF and property graphs, both the Semantic Web and the database community have independently developed graph schema languages: SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide formal, concise definitions of the core components of each of these schema languages. We employ a uniform framework to facilitate a comprehensive comparison between the languages and identify a common set of functionalities, shedding light on both overlapping and distinctive features of the three languages.
