Table of Contents
Fetching ...

Repairing Property Graphs under PG-Constraints

Christopher Spinrath, Angela Bonifati, Rachid Echahed

TL;DR

The paper tackles repairing property graphs under PG-Constraints by developing RGPC, a practical subset of RGPC-GPC constraints that support recursion and automata-based reasoning. It proposes a holistic repair pipeline that can delete nodes, edges, or labels, and compares three repair strategies: ILP, naive greedy, and LP-guided greedy, showing that label deletions can drastically reduce the total number of deletions (up to 59%) and that the LP-guided greedy often matches ILP quality with major runtime savings (up to 97%). Optional steps enable label deletions and neighborhood-based refinements to further balance repair quality and performance. The approach is validated on real-world datasets, including an investigative journalism graph, demonstrating effective repairs with practical gains in both accuracy and efficiency. Overall, the work provides a rigorous framework for constraint-aware graph repair and offers scalable algorithms with practical impact for ensuring data integrity in property graphs.

Abstract

Recent standardization efforts for graph databases lead to standard query languages like GQL and SQL/PGQ, and constraint languages like Property Graph Constraints (PG-Constraints). In this paper, we embark on the study of repairing property graphs under PG-Constraints. We identify a significant subset of PG-Constraints, encoding denial constraints and including recursion as a key feature, while still permitting automata-based structural analyses of errors. We present a comprehensive repair pipeline for these constraints to repair Property Graphs, involving changes in the graph topology and leading to node, edge and, optionally, label deletions. We investigate three algorithmic strategies for the repair procedure, based on Integer Linear Programming (ILP), a naive, and an LP-guided greedy algorithm. Our experiments on various real-world datasets reveal that repairing with label deletions can achieve a 59% reduction in deletions compared to node/edge deletions. Moreover, the LP-guided greedy algorithm offers a runtime advantage of up to 97% compared to the ILP strategy, while matching the same quality.

Repairing Property Graphs under PG-Constraints

TL;DR

The paper tackles repairing property graphs under PG-Constraints by developing RGPC, a practical subset of RGPC-GPC constraints that support recursion and automata-based reasoning. It proposes a holistic repair pipeline that can delete nodes, edges, or labels, and compares three repair strategies: ILP, naive greedy, and LP-guided greedy, showing that label deletions can drastically reduce the total number of deletions (up to 59%) and that the LP-guided greedy often matches ILP quality with major runtime savings (up to 97%). Optional steps enable label deletions and neighborhood-based refinements to further balance repair quality and performance. The approach is validated on real-world datasets, including an investigative journalism graph, demonstrating effective repairs with practical gains in both accuracy and efficiency. Overall, the work provides a rigorous framework for constraint-aware graph repair and offers scalable algorithms with practical impact for ensuring data integrity in property graphs.

Abstract

Recent standardization efforts for graph databases lead to standard query languages like GQL and SQL/PGQ, and constraint languages like Property Graph Constraints (PG-Constraints). In this paper, we embark on the study of repairing property graphs under PG-Constraints. We identify a significant subset of PG-Constraints, encoding denial constraints and including recursion as a key feature, while still permitting automata-based structural analyses of errors. We present a comprehensive repair pipeline for these constraints to repair Property Graphs, involving changes in the graph topology and leading to node, edge and, optionally, label deletions. We investigate three algorithmic strategies for the repair procedure, based on Integer Linear Programming (ILP), a naive, and an LP-guided greedy algorithm. Our experiments on various real-world datasets reveal that repairing with label deletions can achieve a 59% reduction in deletions compared to node/edge deletions. Moreover, the LP-guided greedy algorithm offers a runtime advantage of up to 97% compared to the ILP strategy, while matching the same quality.
Paper Structure (47 sections, 3 theorems, 22 equations, 24 figures, 16 tables, 1 algorithm)

This paper contains 47 sections, 3 theorems, 22 equations, 24 figures, 16 tables, 1 algorithm.

Key Result

proposition 1

Given a minimum weight vertex cover $V$, removingRecall that removing a node implies removing all its incident edges in $G$. all objects in $V$ from $G$ yields a topological repair of $G$.

Figures (24)

  • Figure 1: A property graph modelling an organisation with persons, tasks, and documents, along with a PG-Constraint
  • Figure 2: Our repair pipeline for property graphs consists of 6 steps, where steps 2 and 3 are optional. 2a and 2b are either both enabled or not, while 5a, 5b and 5c are alternatives. Solid edges indicate control flow, while dotted edges indicate communication.
  • Figure 3: Different encodings of topological errors
  • Figure 4: Automaton for the RGPC pattern from Example \ref{['example:constraints-intro']}
  • Figure 5: Shapes of RGPC patterns used in Sections \ref{['section:performance-quality']} and \ref{['section:ablation']}
  • ...and 19 more figures

Theorems & Definitions (9)

  • definition 1: Subgraph
  • definition 2: Repair
  • definition 3: Topological Error
  • definition 4: Topological Conflict Hypergraph
  • proposition 1
  • proposition 2
  • proposition 3
  • definition 5: Path
  • definition 6: RGPC Automaton