Towards Efficient Data Structures for Approximate Search with Range Queries
Ladan Kian, Dariusz R. Kowalski
TL;DR
This work addresses the efficiency-accuracy trade-off in approximate range queries by analyzing SRC-search on 1D data. It introduces the c-DAG, a tunable augmentation of the 1D-Tree with overlapping intervals, and proves that it achieves a constant additive time overhead while yielding a multiplicative logarithmic reduction in false positives, $\Theta(\log (N/s))$. The authors extend the analysis to skewed queries via a generic framework and validate the approach on Gowalla and synthetic data, demonstrating tighter covers and improved privacy properties. They further examine security implications, including leakage of returned levels and query length, and discuss the practical impact for privacy-preserving systems such as searchable encryption and multimedia retrieval.
Abstract
Range queries are simple and popular types of queries used in data retrieval. However, extracting exact and complete information using range queries is costly. As a remedy, some previous work proposed a faster principle, {\em approximate} search with range queries, also called single range cover (SRC) search. It can, however, produce some false positives. In this work we introduce a new SRC search structure, a $c$-DAG (Directed Acyclic Graph), which provably decreases the average number of false positives by logarithmic factor while keeping asymptotically same time and memory complexities as a classic tree structure. A $c$-DAG is a tunable augmentation of the 1D-Tree with denser overlapping branches ($c \geq 3$ children per node). We perform a competitive analysis of a $c$-DAG with respect to 1D-Tree and derive an additive constant time overhead and a multiplicative logarithmic improvement of the false positives ratio, on average. We also provide a generic framework to extend our results to empirical distributions of queries, and demonstrate its effectiveness for Gowalla dataset. Finally, we quantify and discuss security and privacy aspects of SRC search on $c$-DAG vs 1D-Tree, mainly mitigation of structural leakage, which makes $c$-DAG a good data structure candidate for deployment in privacy-preserving systems (e.g., searchable encryption) and multimedia retrieval.
