Table of Contents
Fetching ...

Partial Adaptive Indexing for Approximate Query Answering

Stavros Maroulis, Nikos Bikakis, Vassilis Stamatopoulos, George Papastefanatos

TL;DR

Problem: interactive exploration of very large raw data files requires fast response times, often at the cost of exact results. Approach: the paper proposes Partial Adaptive Indexing that uses a hierarchical tile-based index with per-tile aggregates and a confidence-interval framework to provide approximate query answers and guide selective index refinement, operating in-situ on raw data. Contributions: a formal optimization for selecting a subset of partially contained tiles to process under an error bound $\phi$, a scoring policy $s(t)=\alpha \cdot w(t) + (1-\alpha)/\text{count}(t \cap Q)$, and a preliminary evaluation demonstrating speedups. Findings: preliminary results indicate substantial speedups during early exploration, with manageable trade-offs between accuracy and latency.

Abstract

In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adaptive indexing driven by both query workload and user-defined accuracy constraints to support approximate query answering. The approach is based on partial index adaptation which reduces the costs associated with reading data files and refining indexes. We leverage a hierarchical tile-based indexing scheme and its stored metadata to provide efficient query evaluation, ensuring accuracy within user-specified bounds. Our preliminary evaluation demonstrates improvement on query evaluation time, especially during initial user exploration.

Partial Adaptive Indexing for Approximate Query Answering

TL;DR

Problem: interactive exploration of very large raw data files requires fast response times, often at the cost of exact results. Approach: the paper proposes Partial Adaptive Indexing that uses a hierarchical tile-based index with per-tile aggregates and a confidence-interval framework to provide approximate query answers and guide selective index refinement, operating in-situ on raw data. Contributions: a formal optimization for selecting a subset of partially contained tiles to process under an error bound , a scoring policy , and a preliminary evaluation demonstrating speedups. Findings: preliminary results indicate substantial speedups during early exploration, with manageable trade-offs between accuracy and latency.

Abstract

In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adaptive indexing driven by both query workload and user-defined accuracy constraints to support approximate query answering. The approach is based on partial index adaptation which reduces the costs associated with reading data files and refining indexes. We leverage a hierarchical tile-based indexing scheme and its stored metadata to provide efficient query evaluation, ensuring accuracy within user-specified bounds. Our preliminary evaluation demonstrates improvement on query evaluation time, especially during initial user exploration.
Paper Structure (7 sections, 1 equation, 2 figures)

This paper contains 7 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Index Adaptation Example (a) Initial index structure; (b) Exact query answering, splitting tiles $t_1$ and $t_3$; (c) Approximate query answering, splitting only $t_3$ and providing results within user accuracy constraints
  • Figure 2: Evaluation Time for Different Error Bounds