A Simple Algorithm for Consistent Query Answering under Primary Keys
Diego Figueira, Anantha Padmanabha, Luc Segoufin, Cristina Sirangelo
TL;DR
This work presents a simple inflationary fixpoint algorithm to compute certain answers for boolean conjunctive queries under primary key constraints, parameterized by k. It provides a polynomial-time under-approximation: if the algorithm outputs yes, the query is certain; otherwise it may miss some certain answers. For self-join-free and path queries, the authors give a complete semantic characterization (HP and FactorCond) that exactly delineates when the fixpoint computes all and only the certain answers, linking these cases to first-order definability. They also establish strong lower bounds showing the algorithm cannot capture certain answers in general, including two-atom self-join queries like q4 and q5, and relate the difficulty to known dichotomy results and the SBM problem. The work furthermore connects boundedness of the fixpoint to FO definability, providing a unified perspective on the polynomial-time cases and their limitations across two major query classes, while outlining directions for extending the approach to broader constraint types and non-boolean queries.
Abstract
We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given inconsistent database) is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Δ$ of subsets of facts of the database of size at most k, where k is the size of the query q. The algorithm runs in polynomial time and can be formally defined as: (1) Initialize $Δ$ with all sets $S$ of at most $k$ facts such that $S\models q$. (2) Add any set $S$ of at most k facts to $Δ$ if there exists a block $B$ (i.e., a maximal set of facts sharing the same key) such that for every fact $a \in B$ there is a set $S' \subseteq S \cup \{a\}$ such that $S'\in Δ$. For an input database $D$, the algorithm answers "q is certain" iff $Δ$ eventually contains the empty set. The algorithm correctly computes certainty when the query q falls in the polynomial time cases of the known dichotomies for self-join-free queries and path queries. For arbitrary Boolean conjunctive queries, the algorithm is an under-approximation: the query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.
