Table of Contents
Fetching ...

Adaptive Quotient Filters

Richard Wen, Hunter McCoy, David Tench, Guido Tagliavini, Michael A. Bender, Alex Conway, Martin Farach-Colton, Rob Johnson, Prashant Pandey

TL;DR

AdaptiveQF introduces the first practical strongly adaptive filter built on the quotient filter, achieving minimal adaptivity overhead while guaranteeing that false positives do not reappear for adversarial workloads. The design extends fingerprints with variable-length extensions and uses a compact reverse map (or merged reverse map) to locate original keys efficiently, preserving cache locality, mergeability, and deletions. The paper proves space bounds for static yes/no list problems and demonstrates that AdaptiveQF attains near-optimal space up to low-order terms, while delivering substantial performance gains on skewed and adversarial workloads, especially when paired with disk-backed databases. Empirically, AdaptiveQF outperforms existing adaptive filters by orders of magnitude in certain regimes and approaches non-adaptive filters in space with much stronger adaptivity guarantees, making it practical for real systems and dynamic workloads.

Abstract

Adaptive filters, such as telescoping and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have seen no adoption in real-world systems due to two main reasons. Firstly, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish the overall system performance compared to a traditional filter. In this paper, we design and implement AdaptiveQF, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The AdaptiveQF is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the AdaptiveQF practical in real systems.

Adaptive Quotient Filters

TL;DR

AdaptiveQF introduces the first practical strongly adaptive filter built on the quotient filter, achieving minimal adaptivity overhead while guaranteeing that false positives do not reappear for adversarial workloads. The design extends fingerprints with variable-length extensions and uses a compact reverse map (or merged reverse map) to locate original keys efficiently, preserving cache locality, mergeability, and deletions. The paper proves space bounds for static yes/no list problems and demonstrates that AdaptiveQF attains near-optimal space up to low-order terms, while delivering substantial performance gains on skewed and adversarial workloads, especially when paired with disk-backed databases. Empirically, AdaptiveQF outperforms existing adaptive filters by orders of magnitude in certain regimes and approaches non-adaptive filters in space with much stronger adaptivity guarantees, making it practical for real systems and dynamic workloads.

Abstract

Adaptive filters, such as telescoping and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have seen no adoption in real-world systems due to two main reasons. Firstly, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish the overall system performance compared to a traditional filter. In this paper, we design and implement AdaptiveQF, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The AdaptiveQF is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the AdaptiveQF practical in real systems.
Paper Structure (28 sections, 7 theorems, 3 equations, 9 figures, 5 tables)

This paper contains 28 sections, 7 theorems, 3 equations, 9 figures, 5 tables.

Key Result

Proposition 1

The yes/noAdaptiveQF uses $(1 + o(1)) n \log(\max\{1/\varepsilon, m/n\}) + O(n)$ bits of space.

Figures (9)

  • Figure 1: The quotient filter PandeyBJP17 structure. The upper part shows the logical structure. The lower part shows the encoding of the logical structure in the quotient filter. It uses two metadata bits per slot. All items that share the same canonical location are stored together in a run. A sequence of items without any empty slot is called a cluster. Note: the items are showed in upper case in the canonical representation and the remainders corresponding to the items in the slots are showed in lower case.
  • Figure 2: AdaptiveQF block diagram and the reverse map. It shows the changes in the schema and the reverse map during queries, insertions, and adaptations.
  • Figure 3: Micro operation throughput of filters absent any system. Done by simulating a sequence of operations first, then returning and performing the same operations but on the filter alone. Adaptive filters are compared on the left, while nonadaptive filters are shown as reference on the right.
  • Figure 4: Parallel insertion throughput. $2^{26}$ slots in the filter.
  • Figure 5: System insert throughput as filter fills up.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Theorem 2
  • Lemma 3
  • Lemma 4: Eisenberg2008MaxGeometric
  • Lemma 5
  • Lemma 6
  • Theorem 7