Table of Contents
Fetching ...

TreeTracker Join: Simple, Optimal, Fast

Zeyuan Hu, Yisu Remy Wang, Daniel P. Miranker

TL;DR

The paper introduces TreeTracker Join (TTJ), a linear-time join algorithm that augments binary hash join with backtracking and tuple deletion to efficiently evaluate acyclic queries. TTJ retains the $O(|\textsf{IN}| + |\textsf{OUT}|)$ bound on full acyclic queries and guarantees to match or outperform binary hash join in hash-probe work for the same plan, while also handling cyclic queries via tree convolution. The authors formalize the approach with a rigorous treatment of α-acyclicity, GYO reductions, and the parent-child structure, and they supplement the theory with empirical results on TPC-H, JOB, and SSB showing competitive performance and clear benefits from optimizations such as no-good lists. They also outline extensions to cyclic queries and future directions, including optimizer development and bushy-plan linear-time guarantees, highlighting TTJ's potential for integration into modern DBMS workflows.

Abstract

We present a novel linear-time acyclic join algorithm, TreeTracker Join (TTJ). The algorithm can be understood as the pipelined binary hash join with a simple twist: upon a hash lookup failure, TTJ resets execution to the binding of the tuple causing the failure, and removes the offending tuple from its relation. Compared to the best known linear-time acyclic join algorithm, Yannakakis's algorithm, TTJ shares the same asymptotic complexity while imposing lower overhead. Further, we prove that when measuring query performance by counting the number of hash probes, TTJ will match or outperform binary hash join on the same plan. This property holds independently of the plan and independently of acyclicity. We are able to extend our theoretical results to cyclic queries by introducing a new hypergraph decomposition method called tree convolution. Tree convolution iteratively identifies and contracts acyclic subgraphs of the query hypergraph. The method avoids redundant calculations associated with tree decomposition and may be of independent interest. Empirical results on TPC-H, the Join Order Benchmark, and the Star Schema Benchmark demonstrate favorable results.

TreeTracker Join: Simple, Optimal, Fast

TL;DR

The paper introduces TreeTracker Join (TTJ), a linear-time join algorithm that augments binary hash join with backtracking and tuple deletion to efficiently evaluate acyclic queries. TTJ retains the bound on full acyclic queries and guarantees to match or outperform binary hash join in hash-probe work for the same plan, while also handling cyclic queries via tree convolution. The authors formalize the approach with a rigorous treatment of α-acyclicity, GYO reductions, and the parent-child structure, and they supplement the theory with empirical results on TPC-H, JOB, and SSB showing competitive performance and clear benefits from optimizations such as no-good lists. They also outline extensions to cyclic queries and future directions, including optimizer development and bushy-plan linear-time guarantees, highlighting TTJ's potential for integration into modern DBMS workflows.

Abstract

We present a novel linear-time acyclic join algorithm, TreeTracker Join (TTJ). The algorithm can be understood as the pipelined binary hash join with a simple twist: upon a hash lookup failure, TTJ resets execution to the binding of the tuple causing the failure, and removes the offending tuple from its relation. Compared to the best known linear-time acyclic join algorithm, Yannakakis's algorithm, TTJ shares the same asymptotic complexity while imposing lower overhead. Further, we prove that when measuring query performance by counting the number of hash probes, TTJ will match or outperform binary hash join on the same plan. This property holds independently of the plan and independently of acyclicity. We are able to extend our theoretical results to cyclic queries by introducing a new hypergraph decomposition method called tree convolution. Tree convolution iteratively identifies and contracts acyclic subgraphs of the query hypergraph. The method avoids redundant calculations associated with tree decomposition and may be of independent interest. Empirical results on TPC-H, the Join Order Benchmark, and the Star Schema Benchmark demonstrate favorable results.
Paper Structure (18 sections, 12 theorems, 4 equations, 10 figures, 1 table)

This paper contains 18 sections, 12 theorems, 4 equations, 10 figures, 1 table.

Key Result

theorem 1

A query $Q$ has a join tree (i.e., $Q$ is $\alpha$-acyclic) if and only if it has a GYO reduction order.

Figures (10)

  • Figure 1: Instantiation of binary hash join on example \ref{['ex:main']}, with backjumping, and with tuple deletion.
  • Figure 2: GYO reduction and parent computation.
  • Figure 3: Binary hash join and Yannakakis's algorithm. The array is 1-indexed.
  • Figure 4: The TreeTracker algorithm and an example execution.
  • Figure 5: Run time of $\textsf{TTJ}\xspace$, $\textsf{HJ}\xspace$, $\textsf{YA}\xspace$, and PostgreSQL on JOB, TPC-H, and SSB. Every data point corresponds to a query, whose $x$- and $y$-coordinates correspond to the run time of the algorithms under comparison.
  • ...and 5 more figures

Theorems & Definitions (29)

  • definition 1: Join Tree
  • definition 2: Key Schema
  • definition 3: Ear
  • definition 4: GYO reduction order
  • theorem 1: Yu1979AnAFgraham1980universal
  • definition 5: Query Plan
  • proposition 1
  • proof
  • lemma 1
  • proof
  • ...and 19 more