Table of Contents
Fetching ...

Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas

Cristian Riveros, Benjamin Scheidt, Nicole Schweikardt

TL;DR

The paper develops the color-index, an index structure for efficiently evaluating fc-ACQs over binary schemas by converting the database into a color-refined graph and producing a compact color-database ${D_{ ext{col}}}$. This enables preprocessing that scales with ${|D_{ ext{col}}|}$ rather than ${|D|}$, and supports Boolean evaluation, enumeration with (near) constant delay, and counting with provable data-structure-based guarantees. The approach leverages a close link between color refinement and query homomorphisms from tree-like CQs, combining graph-theoretic coloring with known fc-ACQ enumeration/counting techniques to achieve improved data complexity in favorable cases. The work also outlines extensions, potential generalizations to non-binary schemas, and dynamic scenarios, while highlighting open questions about scalability, practical deployment, and broader query classes.

Abstract

We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment from the active domain of $D$ to a set of colors $C_D$. The main ingredient of the color-index is a particular database $D_c$ whose active domain is $C_D$ and whose size is at most $|D|$. Using the color-index, we can evaluate any free-connex ACQ $Q$ over $D$ with preprocessing time $O(|Q| \cdot |D_c|)$ and constant delay enumeration. Furthermore, we can also count the number of results of $Q$ over $D$ in time $O(|Q| \cdot |D_c|)$. Given that $|D_c|$ could be much smaller than $|D|$ (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.

Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas

TL;DR

The paper develops the color-index, an index structure for efficiently evaluating fc-ACQs over binary schemas by converting the database into a color-refined graph and producing a compact color-database . This enables preprocessing that scales with rather than , and supports Boolean evaluation, enumeration with (near) constant delay, and counting with provable data-structure-based guarantees. The approach leverages a close link between color refinement and query homomorphisms from tree-like CQs, combining graph-theoretic coloring with known fc-ACQ enumeration/counting techniques to achieve improved data complexity in favorable cases. The work also outlines extensions, potential generalizations to non-binary schemas, and dynamic scenarios, while highlighting open questions about scalability, practical deployment, and broader query classes.

Abstract

We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database , we use a suitable version of the color refinement algorithm to produce a stable coloring of , an assignment from the active domain of to a set of colors . The main ingredient of the color-index is a particular database whose active domain is and whose size is at most . Using the color-index, we can evaluate any free-connex ACQ over with preprocessing time and constant delay enumeration. Furthermore, we can also count the number of results of over in time . Given that could be much smaller than (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.
Paper Structure (30 sections, 16 theorems, 19 equations, 1 figure)

This paper contains 30 sections, 16 theorems, 19 equations, 1 figure.

Key Result

Proposition 4.1

A CQ $Q$ of a binary schema $\sigma$ is acyclic iff its Gaifman graph $G(Q)$ is acyclic. The CQ $Q$ is free-connex acyclic iff $G(Q)$ is acyclic and the following statement is true: For every connected component $C$ of $G(Q)$, the subgraph of $C$ induced by the set $\textrm{\upshape free}(Q) \cap V(

Figures (1)

  • Figure 1: (a) shows the graph $\bar{G}^{\operatorname{ex}}$ representing the database $D^{\operatorname{ex}}$ from Example \ref{['example:running1']} (all vertices $v$ have the same label $\textnormal{vl}(v)=\emptyset$), (b) shows a coarsest stable coloring of $\bar{G}^{\operatorname{ex}}$, and (c) shows a graph representing the color database $D^{\operatorname{ex}}_{\operatorname{col}}$ of $D^{\operatorname{ex}}$.

Theorems & Definitions (36)

  • Proposition 4.1: Folklore
  • Theorem 4.2: Yannakakis Yannakakis1981
  • Theorem 4.3
  • Example 5.1
  • Theorem 5.2: BBG-ColorRefinement
  • Example 5.3
  • proof
  • Example 5.5
  • Proposition 5.6
  • proof
  • ...and 26 more