Adaptive Recursive Query Optimization
Anna Herlihy, Guillaume Martres, Anastasia Ailamaki, Martin Odersky
TL;DR
The paper tackles the scalability challenge of optimizing recursive queries in data-intensive analyses. It proposes Adaptive Metaprogramming to move optimization and code generation from compile-time to runtime, enabling dynamic re-optimization during the query lifecycle. It introduces a runtime join-order optimization and demonstrates the approach with Carac, a dedicated Datalog engine, achieving substantial performance gains. These results suggest that runtime-adaptive optimization can dramatically improve recursive query processing in industrial-scale settings.
Abstract
Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, are increasingly reliant on recursive queries for data analysis. Yet traditional relational algebra-based query optimization techniques do not scale well to recursive query processing due to the iterative nature of query evaluation, where relation cardinalities can change unpredictably during the course of a single query execution. To avoid error-prone cardinality estimation, adaptive query processing techniques use runtime information to inform query optimization, but these systems are not optimized for the specific needs of recursive query processing. In this paper, we introduce Adaptive Metaprogramming, an innovative technique that shifts recursive query optimization and code generation from compile-time to runtime using principled metaprogramming, enabling dynamic optimization and re-optimization before and after query execution has begun. We present a custom join-ordering optimization applicable at multiple stages during query compilation and execution. Through Carac, a custom Datalog engine, we evaluate the optimization potential of Adaptive Metaprogramming and show unoptimized recursive query execution time can be improved by three orders of magnitude and hand-optimized queries by 6x.
