Table of Contents
Fetching ...

A Simple Algorithm for Consistent Query Answering under Primary Keys

Diego Figueira, Anantha Padmanabha, Luc Segoufin, Cristina Sirangelo

TL;DR

This work presents a simple inflationary fixpoint algorithm to compute certain answers for boolean conjunctive queries under primary key constraints, parameterized by k. It provides a polynomial-time under-approximation: if the algorithm outputs yes, the query is certain; otherwise it may miss some certain answers. For self-join-free and path queries, the authors give a complete semantic characterization (HP and FactorCond) that exactly delineates when the fixpoint computes all and only the certain answers, linking these cases to first-order definability. They also establish strong lower bounds showing the algorithm cannot capture certain answers in general, including two-atom self-join queries like q4 and q5, and relate the difficulty to known dichotomy results and the SBM problem. The work furthermore connects boundedness of the fixpoint to FO definability, providing a unified perspective on the polynomial-time cases and their limitations across two major query classes, while outlining directions for extending the approach to broader constraint types and non-boolean queries.

Abstract

We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given inconsistent database) is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Δ$ of subsets of facts of the database of size at most k, where k is the size of the query q. The algorithm runs in polynomial time and can be formally defined as: (1) Initialize $Δ$ with all sets $S$ of at most $k$ facts such that $S\models q$. (2) Add any set $S$ of at most k facts to $Δ$ if there exists a block $B$ (i.e., a maximal set of facts sharing the same key) such that for every fact $a \in B$ there is a set $S' \subseteq S \cup \{a\}$ such that $S'\in Δ$. For an input database $D$, the algorithm answers "q is certain" iff $Δ$ eventually contains the empty set. The algorithm correctly computes certainty when the query q falls in the polynomial time cases of the known dichotomies for self-join-free queries and path queries. For arbitrary Boolean conjunctive queries, the algorithm is an under-approximation: the query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.

A Simple Algorithm for Consistent Query Answering under Primary Keys

TL;DR

This work presents a simple inflationary fixpoint algorithm to compute certain answers for boolean conjunctive queries under primary key constraints, parameterized by k. It provides a polynomial-time under-approximation: if the algorithm outputs yes, the query is certain; otherwise it may miss some certain answers. For self-join-free and path queries, the authors give a complete semantic characterization (HP and FactorCond) that exactly delineates when the fixpoint computes all and only the certain answers, linking these cases to first-order definability. They also establish strong lower bounds showing the algorithm cannot capture certain answers in general, including two-atom self-join queries like q4 and q5, and relate the difficulty to known dichotomy results and the SBM problem. The work furthermore connects boundedness of the fixpoint to FO definability, providing a unified perspective on the polynomial-time cases and their limitations across two major query classes, while outlining directions for extending the approach to broader constraint types and non-boolean queries.

Abstract

We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given inconsistent database) is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set of subsets of facts of the database of size at most k, where k is the size of the query q. The algorithm runs in polynomial time and can be formally defined as: (1) Initialize with all sets of at most facts such that . (2) Add any set of at most k facts to if there exists a block (i.e., a maximal set of facts sharing the same key) such that for every fact there is a set such that . For an input database , the algorithm answers "q is certain" iff eventually contains the empty set. The algorithm correctly computes certainty when the query q falls in the polynomial time cases of the known dichotomies for self-join-free queries and path queries. For arbitrary Boolean conjunctive queries, the algorithm is an under-approximation: the query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.
Paper Structure (20 sections, 33 theorems, 21 equations, 7 figures)

This paper contains 20 sections, 33 theorems, 21 equations, 7 figures.

Key Result

Proposition 3.1

For all $q,\Gamma,k$, $\Cqk(q)$ runs in time polynomial in the size of its input "database" $D$ and, if $D\models \Cqk(q)$ then $D \models \certain(q)$.

Figures (7)

  • Figure 1: "Solution graph" for "database" $\Dn$. Black dots denote "facts", rectangles denote "blocks", and three-pointed edges denote "triangles" ("ie", 3-cliques) in the "solution graph" of $\Dn$. There are $n-1$ "facts" in each "block" $B_i$ and two "facts" in each "block" $E^j_k$.
  • Figure 2: "Triangles" in the "database" $\Dn$.
  • Figure 3: Depiction of $\back(j,l)$ as green solid discs, and $\front(j,l)$ as blue solid discs.
  • Figure 4: "Database" $\Dn[k+2]$ in the proof of Claim \ref{['claim2']}, with "$k$-obstruction set" $W$ over "blocks" $\mathbb{X}$. Dotted-line boxes depict the "block" $A$ in cases a) b) and c) of the proof.
  • Figure 5: Example of construction of $D'$ from $D$ in the proof of \ref{['theorem-UnconditionX+-Lowerbound-sjf']}. The symbols "$*$" stand for the necessary constants "$g$" used in order to obtain the depicted "blocks". $D$ has $4$ "blocks" and $5$ cliques. Observe that neither the edge-less singleton clique $C_2$ nor the clique $C_4$ ---which is internal to the "block" $B_4$--- intervene in the relation $S_1$. Light gray edges depict the pairs of "facts" from $D'$ which form a "solution" to $\qFive$.
  • ...and 2 more figures

Theorems & Definitions (75)

  • Conjecture 2.1: ""Dichotomy conjecture""
  • Example 2.2
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Example 3.4
  • Claim 3.5
  • proof : Proof of \ref{['claim-qTwo']}
  • Lemma 4.1
  • proof
  • ...and 65 more