Table of Contents
Fetching ...

Towards Efficient Data Structures for Approximate Search with Range Queries

Ladan Kian, Dariusz R. Kowalski

TL;DR

This work addresses the efficiency-accuracy trade-off in approximate range queries by analyzing SRC-search on 1D data. It introduces the c-DAG, a tunable augmentation of the 1D-Tree with overlapping intervals, and proves that it achieves a constant additive time overhead while yielding a multiplicative logarithmic reduction in false positives, $\Theta(\log (N/s))$. The authors extend the analysis to skewed queries via a generic framework and validate the approach on Gowalla and synthetic data, demonstrating tighter covers and improved privacy properties. They further examine security implications, including leakage of returned levels and query length, and discuss the practical impact for privacy-preserving systems such as searchable encryption and multimedia retrieval.

Abstract

Range queries are simple and popular types of queries used in data retrieval. However, extracting exact and complete information using range queries is costly. As a remedy, some previous work proposed a faster principle, {\em approximate} search with range queries, also called single range cover (SRC) search. It can, however, produce some false positives. In this work we introduce a new SRC search structure, a $c$-DAG (Directed Acyclic Graph), which provably decreases the average number of false positives by logarithmic factor while keeping asymptotically same time and memory complexities as a classic tree structure. A $c$-DAG is a tunable augmentation of the 1D-Tree with denser overlapping branches ($c \geq 3$ children per node). We perform a competitive analysis of a $c$-DAG with respect to 1D-Tree and derive an additive constant time overhead and a multiplicative logarithmic improvement of the false positives ratio, on average. We also provide a generic framework to extend our results to empirical distributions of queries, and demonstrate its effectiveness for Gowalla dataset. Finally, we quantify and discuss security and privacy aspects of SRC search on $c$-DAG vs 1D-Tree, mainly mitigation of structural leakage, which makes $c$-DAG a good data structure candidate for deployment in privacy-preserving systems (e.g., searchable encryption) and multimedia retrieval.

Towards Efficient Data Structures for Approximate Search with Range Queries

TL;DR

This work addresses the efficiency-accuracy trade-off in approximate range queries by analyzing SRC-search on 1D data. It introduces the c-DAG, a tunable augmentation of the 1D-Tree with overlapping intervals, and proves that it achieves a constant additive time overhead while yielding a multiplicative logarithmic reduction in false positives, . The authors extend the analysis to skewed queries via a generic framework and validate the approach on Gowalla and synthetic data, demonstrating tighter covers and improved privacy properties. They further examine security implications, including leakage of returned levels and query length, and discuss the practical impact for privacy-preserving systems such as searchable encryption and multimedia retrieval.

Abstract

Range queries are simple and popular types of queries used in data retrieval. However, extracting exact and complete information using range queries is costly. As a remedy, some previous work proposed a faster principle, {\em approximate} search with range queries, also called single range cover (SRC) search. It can, however, produce some false positives. In this work we introduce a new SRC search structure, a -DAG (Directed Acyclic Graph), which provably decreases the average number of false positives by logarithmic factor while keeping asymptotically same time and memory complexities as a classic tree structure. A -DAG is a tunable augmentation of the 1D-Tree with denser overlapping branches ( children per node). We perform a competitive analysis of a -DAG with respect to 1D-Tree and derive an additive constant time overhead and a multiplicative logarithmic improvement of the false positives ratio, on average. We also provide a generic framework to extend our results to empirical distributions of queries, and demonstrate its effectiveness for Gowalla dataset. Finally, we quantify and discuss security and privacy aspects of SRC search on -DAG vs 1D-Tree, mainly mitigation of structural leakage, which makes -DAG a good data structure candidate for deployment in privacy-preserving systems (e.g., searchable encryption) and multimedia retrieval.
Paper Structure (20 sections, 11 theorems, 76 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 11 theorems, 76 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

For a given natural number $c\ge 3$, the $c$-DAG can be stored in $O(c N \log^2 N)$ memory bits and the SRC-search can be completed in $O(\log n)$ time steps.

Figures (4)

  • Figure 1: DAG constructed over a dataset of size 16. The orange intervals indicate augmented overlapping nodes added on the top of the corresponding 1D-Tree (blue intervals). For $Q_1=[2,6)$, SRC-search on 1D-Tree returns the level 1 node $[0,8)$ while 3-DAG returns the level 2 node $[2,6)$. For $Q_2=[11,15)$ (same length), SRC-search on both data structures returns the level 1 node $[8,16)$.
  • Figure 2: Returned level distribution with cumulative probability lines for $s=3600$ (1 hour) on the Gowalla dataset. Green, blue, and red bars represent the empirical probabilities for the 1D-Tree, 3-DAG, and 5-DAG, respectively (left y-axis). The corresponding lines show the cumulative distribution functions, CDFs (right y-axis).
  • Figure 3: Returned level distribution with cumulative probability lines for $s=86400$ (1 day) on the Gowalla dataset. Green, blue, and red bars represent the empirical probabilities for the 1D-Tree, 3-DAG, and 5-DAG, respectively (left y-axis). The corresponding dashed lines show the cumulative distribution functions, CDFs (right y-axis).
  • Figure 4: Empirical false positive competitive ratio, post-stabilization, meaning after running 120 000 extra queries, plotted for query lengths one minute, one hour, one day, and one week.

Theorems & Definitions (22)

  • Definition 1: Range-Supporting Data Structure falzon2023range
  • Definition 2: 1D-Tree
  • Definition 3: $c$-DAG Family
  • Proposition 1
  • Lemma 1: Level of Returned Nodes in a $c$-DAG
  • proof
  • Lemma 2
  • proof
  • Lemma 3: Level of Returned Nodes in 1D-Tree
  • proof
  • ...and 12 more