TreeTracker Join: Simple, Optimal, Fast
Zeyuan Hu, Yisu Remy Wang, Daniel P. Miranker
TL;DR
The paper introduces TreeTracker Join (TTJ), a linear-time join algorithm that augments binary hash join with backtracking and tuple deletion to efficiently evaluate acyclic queries. TTJ retains the $O(|\textsf{IN}| + |\textsf{OUT}|)$ bound on full acyclic queries and guarantees to match or outperform binary hash join in hash-probe work for the same plan, while also handling cyclic queries via tree convolution. The authors formalize the approach with a rigorous treatment of α-acyclicity, GYO reductions, and the parent-child structure, and they supplement the theory with empirical results on TPC-H, JOB, and SSB showing competitive performance and clear benefits from optimizations such as no-good lists. They also outline extensions to cyclic queries and future directions, including optimizer development and bushy-plan linear-time guarantees, highlighting TTJ's potential for integration into modern DBMS workflows.
Abstract
We present a novel linear-time acyclic join algorithm, TreeTracker Join (TTJ). The algorithm can be understood as the pipelined binary hash join with a simple twist: upon a hash lookup failure, TTJ resets execution to the binding of the tuple causing the failure, and removes the offending tuple from its relation. Compared to the best known linear-time acyclic join algorithm, Yannakakis's algorithm, TTJ shares the same asymptotic complexity while imposing lower overhead. Further, we prove that when measuring query performance by counting the number of hash probes, TTJ will match or outperform binary hash join on the same plan. This property holds independently of the plan and independently of acyclicity. We are able to extend our theoretical results to cyclic queries by introducing a new hypergraph decomposition method called tree convolution. Tree convolution iteratively identifies and contracts acyclic subgraphs of the query hypergraph. The method avoids redundant calculations associated with tree decomposition and may be of independent interest. Empirical results on TPC-H, the Join Order Benchmark, and the Star Schema Benchmark demonstrate favorable results.
