Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas
Cristian Riveros, Benjamin Scheidt, Nicole Schweikardt
TL;DR
The paper develops the color-index, an index structure for efficiently evaluating fc-ACQs over binary schemas by converting the database into a color-refined graph and producing a compact color-database ${D_{ ext{col}}}$. This enables preprocessing that scales with ${|D_{ ext{col}}|}$ rather than ${|D|}$, and supports Boolean evaluation, enumeration with (near) constant delay, and counting with provable data-structure-based guarantees. The approach leverages a close link between color refinement and query homomorphisms from tree-like CQs, combining graph-theoretic coloring with known fc-ACQ enumeration/counting techniques to achieve improved data complexity in favorable cases. The work also outlines extensions, potential generalizations to non-binary schemas, and dynamic scenarios, while highlighting open questions about scalability, practical deployment, and broader query classes.
Abstract
We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment from the active domain of $D$ to a set of colors $C_D$. The main ingredient of the color-index is a particular database $D_c$ whose active domain is $C_D$ and whose size is at most $|D|$. Using the color-index, we can evaluate any free-connex ACQ $Q$ over $D$ with preprocessing time $O(|Q| \cdot |D_c|)$ and constant delay enumeration. Furthermore, we can also count the number of results of $Q$ over $D$ in time $O(|Q| \cdot |D_c|)$. Given that $|D_c|$ could be much smaller than $|D|$ (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.
