Compiling Set Queries into Work-Efficient Tree Traversals
Alexander J Root, Christophe Gyurgyik, Purvi Goel, Kayvon Fatahalian, Jonathan Ragan-Kelley, Andrew Adams, Fredrik Kjolstad
TL;DR
This work tackles the manual burden of designing tree-based pruning for diverse queries by introducing Bonsai, a compiler that automatically generates work-efficient tree traversals from high-level query specifications and annotated tree metadata. It combines a lowering procedure that fuses set operations into a single traversal with predicate analysis grounded in symbolic interval analysis, extended to geometric predicates, to automatically derive necessary and sufficient pruning conditions. Bonsai also supports generalized non-equijoins through single-index and dual-index tree traversals, enabling efficient joins beyond standard equality and range predicates. Empirical results show Bonsai-generated traversals match expert-written implementations and, in many cases, outperform naïve linear scans and nested-loop joins, validating the practicality and scalability of compiler-driven pruning. Together, these advances point toward reusable, derivable acceleration for a broad class of tree-structured queries with substantial potential impact on database, graphics, and scientific computing workloads.
Abstract
Trees can accelerate queries that search or aggregate values over large collections. They achieve this by storing metadata that enables quick pruning (or inclusion) of subtrees when predicates on that metadata can prove that none (or all) of the data in a subtree affect the query result. Existing systems implement this pruning logic manually for each query predicate and data structure. We generalize and mechanize this class of optimization. Our method derives conditions for when subtrees can be pruned (or included wholesale), expressed in terms of the metadata available at each node. We efficiently generate these conditions using symbolic interval analysis, extended with new rules to handle geometric predicates (e.g., intersection, containment). Additionally, our compiler fuses compound queries (e.g., reductions on filters) into a single tree traversal. These techniques enable the automatic derivation of generalized single-index and dual-index tree joins that support a wide class of join predicates beyond standard equality and range predicates. The generated traversals match the behavior of expert-written code that implements query-specific traversals, and can asymptotically outperform the linear scans and nested-loop joins that existing systems fall back to when hand-written cases do not apply.
