From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT Queries (Extended Version)
Philipp Seifer, Daniel Hernández, Ralf Lämmel, Steffen Staab
TL;DR
This work tackles the challenge of deriving SHACL constraints that must hold on all possible outputs of a fixed SPARQL CONSTRUCT query when inputs themselves are SHACL-constrained. It formalizes a fragment of SPARQL (SCCQ) and SHACL (Simple SHACL) within a description-logic framework (ALCHOI), and proposes a static analysis that encodes query execution and input constraints into axioms to compute a sound upper bound of possible output shapes. The key contributions include formal problem statements for OutputShapes, a two-stage approach (candidate generation via vocabulary-bound shapes and filtering via IsOutputShape), and an extended-graph axiomatization capturing execution steps and variable bindings; NP-hardness of IsOutputShape is established, with a practical, implemented solver demonstrated on synthetic configurations. The results enable developers to reason about and validate CONSTRUCT result graphs in data processing pipelines without relying on concrete inputs, offering a principled mechanism to anticipate shape propagation and guide validation strategies.
Abstract
SPARQL CONSTRUCT queries allow for the specification of data processing pipelines that transform given input graphs into new output graphs. It is now common to constrain graphs through SHACL shapes allowing users to understand which data they can expect and which not. However, it becomes challenging to understand what graph data can be expected at the end of a data processing pipeline without knowing the particular input data: Shape constraints on the input graph may affect the output graph, but may no longer apply literally, and new shapes may be imposed by the query template. In this paper, we study the derivation of shape constraints that hold on all possible output graphs of a given SPARQL CONSTRUCT query. We assume that the SPARQL CONSTRUCT query is fixed, e.g., being part of a program, whereas the input graphs adhere to input shape constraints but may otherwise vary over time and, thus, are mostly unknown. We study a fragment of SPARQL CONSTRUCT queries (SCCQ) and a fragment of SHACL (Simple SHACL). We formally define the problem of deriving the most restrictive set of Simple SHACL shapes that constrain the results from evaluating a SCCQ over any input graph restricted by a given set of Simple SHACL shapes. We propose and implement an algorithm that statically analyses input SHACL shapes and CONSTRUCT queries and prove its soundness and complexity.
