Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation
Maja Franz, Tobias Winker, Sven Groppe, Wolfgang Mauerer
TL;DR
The study investigates quantum reinforcement learning (QRL) for join order (JO) optimization in database systems, comparing against a classical RL baseline and a single-step QML method. It proposes a multi-step QRL framework using a hybrid variational quantum circuit with reduced input encoding to handle bushy join trees while using far fewer qubits and trainable parameters. Across JOB benchmark simulations, QRL matches or nears classical performance in result quality and can outperform single-step QML by up to 17% in median cost under ideal conditions, though current hardware noise limits practical advantage. The work delivers open-source tooling and provides a nuanced assessment of quantum advantages, highlighting parameter efficiency and scalability as promising benefits for dynamic, low-latency JO scenarios and outlining directions for future hardware and encoding improvements.
Abstract
Identifying optimal join orders (JOs) stands out as a key challenge in database research and engineering. Owing to the large search space, established classical methods rely on approximations and heuristics. Recent efforts have successfully explored reinforcement learning (RL) for JO. Likewise, quantum versions of RL have received considerable scientific attention. Yet, it is an open question if they can achieve sustainable, overall practical advantages with improved quantum processors. In this paper, we present a novel approach that uses quantum reinforcement learning (QRL) for JO based on a hybrid variational quantum ansatz. It is able to handle general bushy join trees instead of resorting to simpler left-deep variants as compared to approaches based on quantum(-inspired) optimisation, yet requires multiple orders of magnitudes fewer qubits, which is a scarce resource even for post-NISQ systems. Despite moderate circuit depth, the ansatz exceeds current NISQ capabilities, which requires an evaluation by numerical simulations. While QRL may not significantly outperform classical approaches in solving the JO problem with respect to result quality (albeit we see parity), we find a drastic reduction in required trainable parameters. This benefits practically relevant aspects ranging from shorter training times compared to classical RL, less involved classical optimisation passes, or better use of available training data, and fits data-stream and low-latency processing scenarios. Our comprehensive evaluation and careful discussion delivers a balanced perspective on possible practical quantum advantage, provides insights for future systemic approaches, and allows for quantitatively assessing trade-offs of quantum approaches for one of the most crucial problems of database management systems.
