Computational Complexity of Preferred Subset Repairs on Data-Graphs
Nina Pardal, Santiago Cifuentes, Edwin Pin, Maria Vanina Martinez, Sergio Abriola
TL;DR
The paper investigates prioritized subset repairs for data-graphs under GXPath-based integrity constraints, formalizing how different preorder criteria (including inclusion, cardinality, weights, and multisets) shape the space of repairs and the corresponding decision and query problems. It proves that repair notions extend the classical subset repairs and that, in the node-positive fragment of GXPath, a unique repair exists and can be computed in PTIME, yielding PTIME solvability for repair and CQA tasks in this regime. For the full GXPath language, the authors establish a comprehensive complexity map, placing repair existence in NP and repair checking in coNP, with CQA reaching $\Pi^p_2$-hard or $\Delta^p_2[\log n]$-level hardness depending on the exact language and preorder. Overall, the work delineates when efficient repair and cautious query answering are feasible and guides future work toward refined tractable variants and alternative consistency notions in graph-structured data systems.
Abstract
Preferences are a pivotal component in practical reasoning, especially in tasks that involve decision-making over different options or courses of action that could be pursued. In this work, we focus on repairing and querying inconsistent knowledge bases in the form of graph databases, which involves finding a way to solve conflicts in the knowledge base and considering answers that are entailed from every possible repair, respectively. Without a priori domain knowledge, all possible repairs are equally preferred. Though that may be adequate for some settings, it seems reasonable to establish and exploit some form of preference order among the potential repairs. We study the problem of computing prioritized repairs over graph databases with data values, using a notion of consistency based on GXPath expressions as integrity constraints. We present several preference criteria based on the standard subset repair semantics, incorporating weights, multisets, and set-based priority levels. We show that it is possible to maintain the same computational complexity as in the case where no preference criterion is available for exploitation. Finally, we explore the complexity of consistent query answering in this setting and obtain tight lower and upper bounds for all the preference criteria introduced.
