PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (Technical Report)
Yuxuan Zhu, Tengjun Jin, Stefanos Baziotis, Chengsong Zhang, Charith Mendis, Daniel Kang
TL;DR
PilotDB tackles the limited industry adoption of approximate query processing by delivering a practical middleware that provides a priori error guarantees without DBMS changes. It combines TAQA, a two-stage online AQP algorithm, with BSAP, a block-sampling theory that preserves statistical guarantees for nested and join queries. Empirical evaluation across PostgreSQL, SQL Server, and DuckDB shows up to 126× speedups with a 5% error target, while maintaining guaranteed error bounds. The approach eliminates maintenance overhead and DBMS modification, offering a scalable, database-agnostic path to fast, reliable approximate analytics in real-world workloads.
Abstract
After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We simple ment TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.
