Speeding up SQL subqueries via decoupling of non-correlated predicate (extended version)
Dmitrii Radivonchik, Yakov Kuzin, Anton Chizhov, Dmitriy Shcheka, Mikhail Firsov, Kirill Smirnov, George Chernishev
TL;DR
The paper tackles the high cost of evaluating correlated subqueries in SQL by isolating the non-correlated predicate to pre-filter the inner input, thereby reducing evaluations of the correlated part. It introduces a taxonomy of subquery classes with targeted rewrites, and integrates these rewrites into a block-based Volcano execution model within a position-enabled column-store, facilitated by a novel positional operator for subqueries. Three simple cost models—Naive, Rewritten, and Proposed—ground an optimizer's decision process by quantifying the tradeoffs using $COSTP_{NC}$, $COSTP_{C}$, $NCFILTER$, $N$, and $M$. Empirical evaluation on PosDB and PostgreSQL demonstrates up to $5\times$ speedups under suitable non-correlated predicate selectivity, highlighting practical impact for analytic workloads and providing a framework for broader adoption and further optimization.
Abstract
In this paper, we discuss a novel technique for processing correlated subqueries in SQL. The core idea is to isolate the non-correlated part of the predicate and use it to reduce the number of evaluations of the correlated part. We begin by providing an overview of several classes of queries that may benefit from this technique. For each class, we propose a potential rewrite and discuss the conditions under which it is advantageous. Next, we address the evaluation aspects of the proposed rewrites: 1) we describe our approach to adapting the block-based Volcano query processing model, and 2) we discuss the benefits of implementing that technique within a position-enabled column-store with late materialization support. Finally, we present a simple cost model that allows estimation of the benefits of said rewrites. Our evaluation has a quantitative part and a qualitative part. The former focuses on studying the impact of non-correlated predicate selectivity on our technique. The latter identifies the limitations of our approach by comparing it with alternative approaches available in existing systems. Overall, experiments conducted using PosDB (a position-enabled column-store) and PostgreSQL demonstrated that, under suitable conditions, our technique can achieve a 5x improvement.
