Partial Adaptive Indexing for Approximate Query Answering
Stavros Maroulis, Nikos Bikakis, Vassilis Stamatopoulos, George Papastefanatos
TL;DR
Problem: interactive exploration of very large raw data files requires fast response times, often at the cost of exact results. Approach: the paper proposes Partial Adaptive Indexing that uses a hierarchical tile-based index with per-tile aggregates and a confidence-interval framework to provide approximate query answers and guide selective index refinement, operating in-situ on raw data. Contributions: a formal optimization for selecting a subset of partially contained tiles to process under an error bound $\phi$, a scoring policy $s(t)=\alpha \cdot w(t) + (1-\alpha)/\text{count}(t \cap Q)$, and a preliminary evaluation demonstrating speedups. Findings: preliminary results indicate substantial speedups during early exploration, with manageable trade-offs between accuracy and latency.
Abstract
In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adaptive indexing driven by both query workload and user-defined accuracy constraints to support approximate query answering. The approach is based on partial index adaptation which reduces the costs associated with reading data files and refining indexes. We leverage a hierarchical tile-based indexing scheme and its stored metadata to provide efficient query evaluation, ensuring accuracy within user-specified bounds. Our preliminary evaluation demonstrates improvement on query evaluation time, especially during initial user exploration.
