Table of Contents
Fetching ...

The Selection Problem in Multi-Query Optimization: a Comprehensive Survey

Sergey Zinchenko, Denis Ponomaryov

TL;DR

This paper addresses the problem of selecting a reusable subset of candidates (views, indexes, or plans) to optimize multi-query workloads under resource budgets, formalizing it as the Candidate Selection Problem (CSP). It unifies view/index/plan caching and MQO through a general framework that leverages tree-structured representations (expression forests and AND-OR-DAGs) to model benefits and expenses, and surveys a broad spectrum of algorithms (exhaustive, greedy, randomized, and hybrid) including ML-based approaches. The authors analyze the fundamental computational complexity, highlight the non-linearities in maintenance and storage costs, and present a technique to exponentially accelerate certain state-of-the-art algorithms. They also propose improvements to SotA methods and discuss open challenges—ranging from candidate-space design to dynamic, distributed, and unified evaluation platforms—armoring future work with concrete, cross-domain strategies. Overall, the work provides a comprehensive, cross-domain view of selection problems in MQO and demonstrates practical, scalable paths to faster, more accurate decision-making.

Abstract

View materialization, index selection, and plan caching are well-known techniques for optimization of query processing in database systems. The essence of these tasks is to select and save a subset of the most useful candidates (views/indexes/plans) for reuse within given space/time budget constraints. In this paper, we propose a unified view on these selection problems. We make a detailed analysis of the root causes of their complexity and summarize techniques to address them. Our survey provides a modern classification of selection algorithms known in the literature, including the latest ones based on Machine Learning. We provide a ground for reuse of the selection techniques between different optimization scenarios and highlight challenges and promising directions in the field. Based on our analysis we derive a method to exponentially accelerate some of the state-of-the-art selection algorithms.

The Selection Problem in Multi-Query Optimization: a Comprehensive Survey

TL;DR

This paper addresses the problem of selecting a reusable subset of candidates (views, indexes, or plans) to optimize multi-query workloads under resource budgets, formalizing it as the Candidate Selection Problem (CSP). It unifies view/index/plan caching and MQO through a general framework that leverages tree-structured representations (expression forests and AND-OR-DAGs) to model benefits and expenses, and surveys a broad spectrum of algorithms (exhaustive, greedy, randomized, and hybrid) including ML-based approaches. The authors analyze the fundamental computational complexity, highlight the non-linearities in maintenance and storage costs, and present a technique to exponentially accelerate certain state-of-the-art algorithms. They also propose improvements to SotA methods and discuss open challenges—ranging from candidate-space design to dynamic, distributed, and unified evaluation platforms—armoring future work with concrete, cross-domain strategies. Overall, the work provides a comprehensive, cross-domain view of selection problems in MQO and demonstrates practical, scalable paths to faster, more accurate decision-making.

Abstract

View materialization, index selection, and plan caching are well-known techniques for optimization of query processing in database systems. The essence of these tasks is to select and save a subset of the most useful candidates (views/indexes/plans) for reuse within given space/time budget constraints. In this paper, we propose a unified view on these selection problems. We make a detailed analysis of the root causes of their complexity and summarize techniques to address them. Our survey provides a modern classification of selection algorithms known in the literature, including the latest ones based on Machine Learning. We provide a ground for reuse of the selection techniques between different optimization scenarios and highlight challenges and promising directions in the field. Based on our analysis we derive a method to exponentially accelerate some of the state-of-the-art selection algorithms.

Paper Structure

This paper contains 36 sections, 2 theorems, 21 equations, 10 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

The View Selection Problem over binary AND-DAG under space constraint is NP-hard.

Figures (10)

  • Figure 1: To speed up query execution it may be useful to find and reuse common computations. Although the computation of $T_2 \bowtie T_3$ is not a part of an optimal plan of either query, it may be optimal for executing both queries together. On the left: optimal plan for $q_1$. In the middle: optimal plan for $q_2$. On the right: optimal plan for the whole workload.
  • Figure 2: To discover options for computation reuse, a procedure of merging several expression trees into a expression forest can be applied. On the left: representation of the workload in the form of expression trees. On the right: the result of merging them into a expression forest $\mathcal{F}$. Eq-nodes ($c_i$) are shown in rectangles, op-nodes ($\gamma_i$) are shown in circles. Data sizes and execution times for operations are shown in small boxes inside these figures. Triangle-shaped nodes are used to depict which data every query $q_i$ needs.
  • Figure 3: When table $T_1$ is updated, the selected candidates must also be updated. And updating candidate $c_3$can be accelerated by reusing the updated common computation $c_2$, which shows that the expense function may have a complex behaviour. The selected candidates are shown in black rectangles and update operations with the corresponding execution times are shown in green.
  • Figure 4: To improve the efficiency of plan caching, it is possible to store common parts of plans in a single instance. This is yet another case of reusing shared computations. On the left: optimal execution plans for queries $q_i$ and $q_j$ with a common subtree ST. On the right: plan caching schema, in which the plan for ST is stored in memory only once.
  • Figure 5: The key observation in our study is that the tree structure of candidates can be used to address virtually all stages in solving the View Selection Problem. Since most MQO problems also represent objects as trees, the developed techniques can be successfully reused in these cases as well.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Example 1
  • Definition 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Theorem 1
  • Example 6
  • Example 7
  • Example 8
  • ...and 1 more