Table of Contents
Fetching ...

Direct Access for Answers to Conjunctive Queries with Aggregation

Idan Eldar, Nofar Carmeli, Benny Kimelfeld

TL;DR

It is shown that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order, and how the complexity of the problem changes when the aggregate and annotation value in the order is included.

Abstract

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research established a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order. Yet, past algorithms do not apply to the count-distinct aggregation, which has no efficient representation as a commutative semiring; for this aggregation, we establish the corresponding tractability condition. We then show how the complexity of the problem changes when we include the aggregate and annotation value in the order. We also study the impact of having all relations but one annotated by the multiplicative identity (one), as happens when we translate aggregate queries into semiring annotations, and having a semiring with an idempotent addition, such as the case of min, max, and count-distinct over a logarithmic-size domain.

Direct Access for Answers to Conjunctive Queries with Aggregation

TL;DR

It is shown that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order, and how the complexity of the problem changes when the aggregate and annotation value in the order is included.

Abstract

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research established a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order. Yet, past algorithms do not apply to the count-distinct aggregation, which has no efficient representation as a commutative semiring; for this aggregation, we establish the corresponding tractability condition. We then show how the complexity of the problem changes when we include the aggregate and annotation value in the order. We also study the impact of having all relations but one annotated by the multiplicative identity (one), as happens when we translate aggregate queries into semiring annotations, and having a semiring with an idempotent addition, such as the case of min, max, and count-distinct over a logarithmic-size domain.
Paper Structure (15 sections, 28 theorems, 26 equations, 4 figures)

This paper contains 15 sections, 28 theorems, 26 equations, 4 figures.

Key Result

Theorem 3.1

Let $Q$ be a CQ.

Figures (4)

  • Figure 1: An example of a $\mathbb{Q}$-database over the numerical semiring constructed to evaluate the AggCQ $Q(c, \mathsf{Sum}(t)) {\,:\!\!-\,} \textsc{Teams}(p,c), \textsc{Goals}(g,p,t),\textsc{Replays}(g,t)$.
  • Figure 2: Example of $Q_R$ and $D_R$ when $N=4$ and $d=3$. The input $(D,\tau)$ is defined by $D = \{R(2),R(13),R(64),R(192)\}$ where $\tau(R(a)) = a$ for every fact $R(a)$.
  • Figure 3: Example of the construction in the proof of \ref{['prop:annotation-aggregation-hardness-gap']}: direct access for the AggCQ $Q(\mathsf{Count}(),x, x') {\,:\!\!-\,} R(x, w), R'(x', w')$.
  • Figure 4: An example for the construction from \ref{['lemma:existential-removal-idempotent-negative']} on the query $Q(x_1, x_2) {\,:\!\!-\,} R(w_1, w_2),S(w_2,x_1),T(x_1,x_2,w_3),U(x_2,w_4,w_5)$ for $R$-annotated databases. Here, $x_1$ is the carrying variable, $R_{\mathsf{carry}}=S_{|\mathrm{free}{(Q)}}$, and $V=\{w_1,w_2\}.$

Theorems & Definitions (56)

  • Remark 2.1
  • Theorem 3.1: DBLP:conf/pods/CarmeliTGKR21
  • Lemma 3.2
  • proof : Proof of \ref{['lem:reduction']}
  • Lemma 3.3
  • proof
  • Theorem 3.4
  • Corollary 3.5
  • Theorem 4.1
  • proof
  • ...and 46 more