Table of Contents
Fetching ...

Gradient-Based Join Ordering

Tim Schwabe, Maribel Acosta

TL;DR

To address the NP-hard join ordering problem, the paper introduces a continuous relaxation of the discrete plan space via a soft adjacency matrix $A^{\mathrm{soft}}$ and optimizes using gradient descent guided by a differentiable cost model $\hat{C}_\theta$. It employs a Graph Neural Network cost model trained on valid plans and uses Gumbel-Softmax with temperature annealing and differentiable structural penalties to enforce valid plans, followed by a projection to a discrete left-linear plan. Empirically, gradient-based search matches or improves over discrete greedy search on LUBM and Wikidata while scaling linearly with query size, in contrast to DP's exponential or greedy's quadratic costs. This work demonstrates a scalable, data-driven approach to query optimization for graph/triple-based data, with a practical projection step to deployable plans.

Abstract

Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query execution critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they often suffer from high computational complexity and limited scalability. We show that, when the cost model is differentiable, the query plans can be continuously relaxed into a soft adjacency matrix representing a superposition of plans. This continuous relaxation, together with a Gumbel-Softmax parameterization of the adjacency matrix and differentiable constraints enforcing plan validity, enables gradient-based search for plans within this relaxed space. Using a learned Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete local search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales linearly with query size, in contrast to quadratic or exponential runtimes of classical approaches. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.

Gradient-Based Join Ordering

TL;DR

To address the NP-hard join ordering problem, the paper introduces a continuous relaxation of the discrete plan space via a soft adjacency matrix and optimizes using gradient descent guided by a differentiable cost model . It employs a Graph Neural Network cost model trained on valid plans and uses Gumbel-Softmax with temperature annealing and differentiable structural penalties to enforce valid plans, followed by a projection to a discrete left-linear plan. Empirically, gradient-based search matches or improves over discrete greedy search on LUBM and Wikidata while scaling linearly with query size, in contrast to DP's exponential or greedy's quadratic costs. This work demonstrates a scalable, data-driven approach to query optimization for graph/triple-based data, with a practical projection step to deployable plans.

Abstract

Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query execution critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they often suffer from high computational complexity and limited scalability. We show that, when the cost model is differentiable, the query plans can be continuously relaxed into a soft adjacency matrix representing a superposition of plans. This continuous relaxation, together with a Gumbel-Softmax parameterization of the adjacency matrix and differentiable constraints enforcing plan validity, enables gradient-based search for plans within this relaxed space. Using a learned Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete local search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales linearly with query size, in contrast to quadratic or exponential runtimes of classical approaches. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.

Paper Structure

This paper contains 30 sections, 22 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Join Ordering using a learned Cost Model. Continuously relaxing the discrete Search Space of Plans allows traversing it using Gradient Descent. The initial Plan is a superposition of all possible plans. The gradient search converges to a valid plan (the corners of the space) by minimizing the predicted cost.
  • Figure 2: A join plan $p$ is represented as a $\left(2n-1\right) \times \left(2n-1\right)$ dimensional matrix $f(p)=A$. The first $n$ rows denote the outgoing edges of triple patterns, while the last $n-1$ rows are outgoing edges of join nodes. The root node always corresponds to the last row.
  • Figure 3: Examples of the used query shapes. Star queries have a single center node with multiple outgoing edges, while path queries form a single chain of connections.
  • Figure 4: True vs. predicted costs for different query shapes (2–14 triple patterns) on the LUBM dataset.
  • Figure 5: 1‑D projections of the cost landscape between two random plans. Even though the space between plans does not represent valid plans, the local gradient still contains directional information towards the better plan.
  • ...and 4 more figures