Table of Contents
Fetching ...

Computer-Orchestrated Design of Algorithms: From Join Specification to Implementation

Zeyuan Hu

Abstract

Equipping query processing systems with provable theoretical guarantees has been a central focus at the intersection of database theory and systems in recent years. However, the divergence between theoretical abstractions and system assumptions creates a gap between an algorithm's high-level logical specification and its low-level physical implementation. Ensuring the correctness of this logical-to-physical translation is crucial for realizing theoretical optimality as practical performance gains. Existing database testing frameworks struggle to address this need because necessary algorithm-specific inputs such as join trees are absent from standard test case generation, and integrating complex algorithms into these frameworks imposes prohibitive engineering overhead. Fallback solutions, such as using macro-benchmark queries, are inherently too noisy for isolating intricate defects during this translation. In this experience paper, we present a retrospective analysis of $\mathsf{CODA}$, a computer-orchestrated testing framework utilized during the physical co-design of TreeTracker Join ($\mathsf{TTJ}$), a theoretically optimal yet practical join algorithm recently published in ACM TODS. By synthesizing minimal reproducible examples, $\mathsf{CODA}$ successfully isolates subtle translation defects, such as state mismanagement and mapping conflicts between join trees and bushy plans. We demonstrate that this logical-to-physical translation process is a bidirectional feedback loop: early structural testing not only hardened $\mathsf{TTJ}$'s physical implementation but also exposed a boundary condition that directly refined the formal precondition of $\mathsf{TTJ}$ itself. Finally, we detail how confronting these translation challenges drove the architectural evolution of $\mathsf{CODA}$ into a robust, structure-aware test generation pipeline for join-tree-dependent algorithms.

Computer-Orchestrated Design of Algorithms: From Join Specification to Implementation

Abstract

Equipping query processing systems with provable theoretical guarantees has been a central focus at the intersection of database theory and systems in recent years. However, the divergence between theoretical abstractions and system assumptions creates a gap between an algorithm's high-level logical specification and its low-level physical implementation. Ensuring the correctness of this logical-to-physical translation is crucial for realizing theoretical optimality as practical performance gains. Existing database testing frameworks struggle to address this need because necessary algorithm-specific inputs such as join trees are absent from standard test case generation, and integrating complex algorithms into these frameworks imposes prohibitive engineering overhead. Fallback solutions, such as using macro-benchmark queries, are inherently too noisy for isolating intricate defects during this translation. In this experience paper, we present a retrospective analysis of , a computer-orchestrated testing framework utilized during the physical co-design of TreeTracker Join (), a theoretically optimal yet practical join algorithm recently published in ACM TODS. By synthesizing minimal reproducible examples, successfully isolates subtle translation defects, such as state mismanagement and mapping conflicts between join trees and bushy plans. We demonstrate that this logical-to-physical translation process is a bidirectional feedback loop: early structural testing not only hardened 's physical implementation but also exposed a boundary condition that directly refined the formal precondition of itself. Finally, we detail how confronting these translation challenges drove the architectural evolution of into a robust, structure-aware test generation pipeline for join-tree-dependent algorithms.
Paper Structure (12 sections, 8 theorems, 4 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 8 theorems, 4 figures, 1 table, 1 algorithm.

Key Result

proposition 1

Each replacement produces one virtual relation and the virtual relation is a leaf node that is not the leftmost.

Figures (4)

  • Figure 1: Illustration of how the evaluation correctness of $Q = R(a,x) \Join S(a,w) \Join T(a,z)$ depends on the join tree structure. Using a defective $\mathsf{TTJ}$ iterator (missing Line \ref{['tt-1-beta-b-hashtable-tq:matchingtuples-nil-pass-context']} in \ref{['algo:ttj-join']}), $Q$ evaluates correctly under the join tree in (a), as traced in (b). Conversely, under the join tree in (c), the identical defective iterator yields an incorrect empty result, as traced in (d). We explain the details in \ref{['sec:motivating-example']}. Symbols ① through ⑥ mark execution snapshots referenced in the text.
  • Figure 2: Logical $\mathsf{TTJ}$ specification from Hu2025, featuring the simplicity of implicit state management via the call stack.
  • Figure 3: Overview of the $\mathsf{CODA}$ framework used during the design of the $\mathsf{TTJ}$ iterator (\ref{['algo:ttj-join']}). Test case synthesis modules are gray rounded rectangles, while differential evaluation modules are white rounded rectangles. The $\mathsf{TTJ}$ iterator implementation is depicted as a circle.
  • Figure 4: Synthesizing a join tree and a database instance in $\mathsf{CODA}$ in the absence of an input query plan.

Theorems & Definitions (14)

  • definition 1
  • definition 2
  • definition 3: the reverse of a GYO reduction order Hu2025
  • definition 4: replacement
  • proposition 1
  • definition 5: nice
  • lemma 1
  • lemma 2
  • lemma 3
  • definition 6: virtual relation method
  • ...and 4 more