The Selection Problem in Multi-Query Optimization: a Comprehensive Survey
Sergey Zinchenko, Denis Ponomaryov
TL;DR
This paper addresses the problem of selecting a reusable subset of candidates (views, indexes, or plans) to optimize multi-query workloads under resource budgets, formalizing it as the Candidate Selection Problem (CSP). It unifies view/index/plan caching and MQO through a general framework that leverages tree-structured representations (expression forests and AND-OR-DAGs) to model benefits and expenses, and surveys a broad spectrum of algorithms (exhaustive, greedy, randomized, and hybrid) including ML-based approaches. The authors analyze the fundamental computational complexity, highlight the non-linearities in maintenance and storage costs, and present a technique to exponentially accelerate certain state-of-the-art algorithms. They also propose improvements to SotA methods and discuss open challenges—ranging from candidate-space design to dynamic, distributed, and unified evaluation platforms—armoring future work with concrete, cross-domain strategies. Overall, the work provides a comprehensive, cross-domain view of selection problems in MQO and demonstrates practical, scalable paths to faster, more accurate decision-making.
Abstract
View materialization, index selection, and plan caching are well-known techniques for optimization of query processing in database systems. The essence of these tasks is to select and save a subset of the most useful candidates (views/indexes/plans) for reuse within given space/time budget constraints. In this paper, we propose a unified view on these selection problems. We make a detailed analysis of the root causes of their complexity and summarize techniques to address them. Our survey provides a modern classification of selection algorithms known in the literature, including the latest ones based on Machine Learning. We provide a ground for reuse of the selection techniques between different optimization scenarios and highlight challenges and promising directions in the field. Based on our analysis we derive a method to exponentially accelerate some of the state-of-the-art selection algorithms.
